Open
Description
Describe the bug
Using copyto!
wtih a destination on the device and a view of an array on the host causing scalar indexing.
To reproduce
The Minimal Working Example (MWE) for this bug:
using CUDA
full_cpu_arr = collect(1:128)
partial_cpu_arr = view(full_cpu_arr, 1:16)
# Preallocate the
a_gpu = CUDA.zeros(Int, length(partial_cpu_arr))
# The following causing scalar indexing?
copyto!(a_gpu, partial_cpu_arr)
Manifest.toml
CUDA v4.0.1
Expected behavior
For a view with a contiguous range (i.e. a UnitRange), this should behave exactly like copying a normal dense array, but with an offset pointer and copying fewer elements.
Version info
Details on Julia: 1.8.5
# please post the output of:
versioninfo()
Details on CUDA:
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × AMD Ryzen Threadripper 3970X 32-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
Threads: 16 on 64 virtual cores
Environment:
JULIA_NUM_THREADS = 16
JULIA_EDITOR = code
Additional context
I was able to do some type piracy to fix the problem, as the SubArray
type (i.e. what comes out of view
implements pointer
function). The type definitions could probably be made broader as this only allows ints to be copied:
Base.copyto!(dest::CuArray{T}, src::SubArray{T, 1, Vector{T}}) where {T<:Int} =
copyto!(dest, 1, src, 1, length(src))
function Base.copyto!(dest::CuArray{T}, doffs::Integer, src::SubArray{T, 1, Vector{T}}, soffs::Integer,
n::Integer) where {T<:Int}
n==0 && return dest
@boundscheck checkbounds(dest, doffs)
@boundscheck checkbounds(dest, doffs+n-1)
@boundscheck checkbounds(src, soffs)
@boundscheck checkbounds(src, soffs+n-1)
unsafe_copyto!(dest, doffs, src, soffs, n)
return dest
end
function Base.unsafe_copyto!(dest::CuArray{T}, doffs,
src::SubArray{T, 1, Vector{T}}, soffs, n) where {T<:Int}
CUDA.context!(CUDA.context(dest)) do
# operations on unpinned memory cannot be executed asynchronously, and synchronize
# without yielding back to the Julia scheduler. prevent that by eagerly synchronizing.
s = CUDA.stream()
CUDA.is_pinned(pointer(src)) || CUDA.nonblocking_synchronize(s)
GC.@preserve src dest begin
CUDA.unsafe_copyto!(pointer(dest, doffs), pointer(src, soffs), n; async=true)
end
end
return dest
end