Skip to content

fixes the kron implementation for sparse + diagonal matrix #2804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions lib/cusparse/linalg.jl
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ function LinearAlgebra.kron(A::CuSparseMatrixCOO{T, Ti}, B::CuSparseMatrixCOO{T,
sparse(row, col, data, out_shape..., fmt = :coo)
end

function LinearAlgebra.kron(A::CuSparseMatrixCOO{T, Ti}, B::Diagonal) where {Ti, T}
function LinearAlgebra.kron(A::CuSparseMatrixCOO{T, Ti}, B::Diagonal{TB}) where {Ti, T, TB}
mA,nA = size(A)
mB,nB = size(B)
out_shape = (mA * mB, nA * nB)
Expand All @@ -102,12 +102,13 @@ function LinearAlgebra.kron(A::CuSparseMatrixCOO{T, Ti}, B::Diagonal) where {Ti,
row .+= CuVector(repeat(0:nB-1, outer = Annz)) .+ 1
col .+= CuVector(repeat(0:nB-1, outer = Annz)) .+ 1

data .*= repeat(CUDA.ones(T, nB), outer = Annz)
Bdiag = (TB == Bool) ? CUDA.ones(T, nB) : B.diag
data .*= repeat(Bdiag, outer = Annz)

sparse(row, col, data, out_shape..., fmt = :coo)
end

function LinearAlgebra.kron(A::Diagonal, B::CuSparseMatrixCOO{T, Ti}) where {Ti, T}
function LinearAlgebra.kron(A::Diagonal{TA}, B::CuSparseMatrixCOO{T, Ti}) where {Ti, T, TA}
mA,nA = size(A)
mB,nB = size(B)
out_shape = (mA * mB, nA * nB)
Expand All @@ -122,7 +123,8 @@ function LinearAlgebra.kron(A::Diagonal, B::CuSparseMatrixCOO{T, Ti}) where {Ti,
row = CuVector(repeat(row, inner = Bnnz))
col = (0:nA-1) .* nB
col = CuVector(repeat(col, inner = Bnnz))
data = repeat(CUDA.ones(T, nA), inner = Bnnz)
Adiag = (TA == Bool) ? CUDA.ones(T, nA) : A.diag
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also need an opinion about these lines. This still leads to unexpected behaviour, when the diagonal of B contains false elements. But first I did not want to break with previous behaviour of the function.
My idea for a clean solution would be to only allow Diagonal{TD, <:CuVector{TD}}, but this would break code that depends on the previous behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we implement the Bool case as multiplication by diag? Relying on false elements still performing multiplication as if by I seems like something we shouldn't support. The behavior here should simply match Base, so I guess we can remove the Bool/I special casing altogether?

Copy link
Author

@tam724 tam724 Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently somebody might expect kron(cu(sprand(2, 2, 1.0)), I(2)) to work. This would error afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works with SparseArrays.jl, so would have to work here too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it?

julia> typeof(I(2))
Diagonal{Bool, Vector{Bool}}

I would expect the CUDA kron to work with the CUDA Diagonal

julia> typeof(cu(I(2)))
Diagonal{Bool, CuArray{Bool, 1, CUDA.DeviceMemory}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following. I'm saying that because kron(sprand(2, 2, 1.0), I(2)) works on the CPU, with Array, it should work on the GPU with CuArray.

Copy link
Author

@tam724 tam724 Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, but I think we are on the same page.
Before this PR the method allowed kron for the combination of a CuArray (sparse) and a CPU Diagonal (by ignoring the diag field of Diagonal and incorrectly assuming that Diagonal represents sized identity matrix).

Handling the Diagonal correctly would disallow this combination but break with the previous behaviour. If we are okay with that, CUDA.jl should implement

LinearAlgebra.kron(A::CuSparseMatrixCOO{<:Number, <:Integer}, B::Diagonal{<:Number, <:CuVector{<:Number})

That would break user code that "worked" before this PR, like:

julia> A = cu(sprand(2, 2, 1.0))
2×2 CuSparseMatrixCSC{Float32, Int32} with 4 stored entries:
 0.98641366  0.92539436
 0.80849653  0.48799357

julia> kron(A, I(2)) # after this PR this would error with ERROR: Scalar indexing is disallowed.
4×4 CuSparseMatrixCSC{Float32, Int32} with 8 stored entries:
 0.98641366             0.92539436    
            0.98641366             0.92539436
 0.80849653             0.48799357    
            0.80849653             0.48799357

julia> kron(A, cu(I(2)) # resolve by moving the Diagonal to the GPU first

I'm hesitant because I don't want to break others code. And because this was also tested behaviour:

@testset "kronecker product with I opa = $opa" for opa in (identity, transpose, adjoint)
@test collect(kron(opa(dA), C)) kron(opa(A), C)
@test collect(kron(C, opa(dA))) kron(C, opa(A))
@test collect(kron(opa(dZA), C)) kron(opa(ZA), C)
@test collect(kron(C, opa(dZA))) kron(C, opa(ZA))
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the time right now to look at this myself, so sorry for the naive questions, but why isn't it possible to support both I inputs as well as Diagonal ones, with the latter respecting the actual diagonal values, by correctly determining the value to broadcast instead of hard-coding CUDA.ones? Or, worst case, by using two different methods?

data = repeat(Adiag, inner = Bnnz)

row .+= repeat(B.rowInd .- 1, outer = Annz) .+ 1
col .+= repeat(B.colInd .- 1, outer = Annz) .+ 1
Expand Down
8 changes: 8 additions & 0 deletions test/libraries/cusparse/linalg.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ m = 10
B = sprand(T, m, m, 0.3)
ZA = spzeros(T, m, m)
C = I(div(m, 2))
D = Diagonal(rand(T, m))
@testset "type = $typ" for typ in [CuSparseMatrixCSR, CuSparseMatrixCSC]
dA = typ(A)
dB = typ(B)
dZA = typ(ZA)
dD = Diagonal(CuArray(D.diag))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adapt(CuArray, D) is more idiomatic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@testset "opnorm and norm" begin
@test opnorm(A, Inf) ≈ opnorm(dA, Inf)
@test opnorm(A, 1) ≈ opnorm(dA, 1)
Expand Down Expand Up @@ -42,6 +44,12 @@ m = 10
@test collect(kron(opa(dZA), C)) ≈ kron(opa(ZA), C)
@test collect(kron(C, opa(dZA))) ≈ kron(C, opa(ZA))
end
@testset "kronecker product with Diagonal opa = $opa" for opa in (identity, transpose, adjoint)
@test collect(kron(opa(dA), dD)) ≈ kron(opa(A), D)
@test collect(kron(dD, opa(dA))) ≈ kron(D, opa(A))
@test collect(kron(opa(dZA), dD)) ≈ kron(opa(ZA), D)
@test collect(kron(dD, opa(dZA))) ≈ kron(D, opa(ZA))
end
end
end

Expand Down