-
Notifications
You must be signed in to change notification settings - Fork 32
Description
I'm implementing an algorithm on GPU where Float32
isn't accurate enough but Float64
incurs a 64x runtime penalty I'd like to avoid. The Fixed
type is appealing because I can get the accuracy I need with a 32-bit type, and I have good prior knowledge of the dynamic range. I thought I'd give it a try but immediately ran into problems. For example,
using CUDA
using FixedPointNumbers
x_cpu = Float32[1, 2 ,3]
x_gpu = cu(x_cpu)
T = Q24f7
T.(x_cpu) # no problem
T.(x_gpu) # fails
log output
warning: linking module flags 'Dwarf Version': IDs have conflicting values ('i32 4' from globals with 'i32 2' from start)
ERROR: LoadError: InvalidIRError: compiling MethodInstance for (::GPUArrays.var"#34#36")(::CUDA.CuKernelContext, ::CuDeviceVector{Q24f7, 1}, ::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, CUDA.var"#1155#1156"{Q24f7}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, ::Int64) resulted in invalid LLVM IR
Reason: unsupported call to an external C function (call to jl_string_to_genericmemory)
Reason: unsupported call to an external C function (call to jl_genericmemory_to_string)
Reason: unsupported call to an external C function (call to ijl_pchar_to_string)
Reason: unsupported dynamic function invocation (call to !=)
Stacktrace:
[1] show
@ ./ryu/Ryu.jl:115
[2] multiple call sites
@ unknown:0
Reason: unsupported dynamic function invocation (call to writeshortest)
Stacktrace:
[1] show
@ ./ryu/Ryu.jl:116
[2] multiple call sites
@ unknown:0
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] show
@ ./ryu/Ryu.jl:114
[5] multiple call sites
@ unknown:0
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] unsafe_wrap
@ ./strings/string.jl:119
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] show
@ ./ryu/Ryu.jl:114
[5] multiple call sites
@ unknown:0
Reason: unsupported dynamic function invocation (call to nonnothingtype)
Stacktrace:
[1] nonnothing_nonmissing_typeinfo
@ ./show.jl:1244
[2] show
@ ./ryu/Ryu.jl:115
[3] multiple call sites
@ unknown:0
Reason: unsupported dynamic function invocation (call to nonmissingtype)
Stacktrace:
[1] nonnothing_nonmissing_typeinfo
@ ./show.jl:1244
[2] show
@ ./ryu/Ryu.jl:115
[3] multiple call sites
@ unknown:0
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] take!
@ ./iobuffer.jl:467
[5] multiple call sites
@ unknown:0
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] unsafe_wrap
@ ./strings/string.jl:119
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] take!
@ ./iobuffer.jl:467
[5] multiple call sites
@ unknown:0
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] take!
@ ./iobuffer.jl:471
[5] multiple call sites
@ unknown:0
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] unsafe_wrap
@ ./strings/string.jl:119
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] take!
@ ./iobuffer.jl:471
[5] multiple call sites
@ unknown:0
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] take!
@ ./iobuffer.jl:476
[5] multiple call sites
@ unknown:0
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] unsafe_wrap
@ ./strings/string.jl:119
[2] StringMemory
@ ./iobuffer.jl:44
[3] StringVector
@ ./iobuffer.jl:45
[4] take!
@ ./iobuffer.jl:476
[5] multiple call sites
@ unknown:0
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) @ Base strings/io.jl:137)
Stacktrace:
[1] string
@ ./strings/io.jl:189
[2] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:322
[3] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[4] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[5] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[6] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[7] _broadcast_getindex
@ ./broadcast.jl:646
[8] getindex
@ ./broadcast.jl:605
[9] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported call to a lazy-initialized function (call to jl_genericmemory_to_string)
Stacktrace:
[1] String
@ ./strings/string.jl:78
[2] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:324
[3] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[4] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[5] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[6] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[7] _broadcast_getindex
@ ./broadcast.jl:646
[8] getindex
@ ./broadcast.jl:605
[9] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported call to a lazy-initialized function (call to ijl_pchar_to_string)
Stacktrace:
[1] String
@ ./strings/string.jl:80
[2] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:324
[3] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[4] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[5] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[6] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[7] _broadcast_getindex
@ ./broadcast.jl:646
[8] getindex
@ ./broadcast.jl:605
[9] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported call to a lazy-initialized function (call to jl_genericmemory_to_string)
Stacktrace:
[1] String
@ ./strings/string.jl:78
[2] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:325
[3] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[4] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[5] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[6] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[7] _broadcast_getindex
@ ./broadcast.jl:646
[8] getindex
@ ./broadcast.jl:605
[9] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported call to a lazy-initialized function (call to ijl_pchar_to_string)
Stacktrace:
[1] String
@ ./strings/string.jl:80
[2] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:325
[3] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[4] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[5] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[6] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[7] _broadcast_getindex
@ ./broadcast.jl:646
[8] getindex
@ ./broadcast.jl:605
[9] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported dynamic function invocation (call to print_to_string(xs...) @ Base strings/io.jl:137)
Stacktrace:
[1] string
@ ./strings/io.jl:189
[2] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:326
[3] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[4] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[5] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[6] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[7] _broadcast_getindex
@ ./broadcast.jl:646
[8] getindex
@ ./broadcast.jl:605
[9] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] StringMemory
@ ./iobuffer.jl:44
[3] #IOBuffer#519
@ ./iobuffer.jl:128
[4] GenericIOBuffer
@ ./iobuffer.jl:119
[5] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:323
[6] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[7] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[8] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[9] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[10] _broadcast_getindex
@ ./broadcast.jl:646
[11] getindex
@ ./broadcast.jl:605
[12] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] unsafe_wrap
@ ./strings/string.jl:119
[2] StringMemory
@ ./iobuffer.jl:44
[3] #IOBuffer#519
@ ./iobuffer.jl:128
[4] GenericIOBuffer
@ ./iobuffer.jl:119
[5] throw_converterror
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:323
[6] _convert
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/fixed.jl:81
[7] FixedPoint
@ ~/.julia/packages/FixedPointNumbers/Dn4hv/src/FixedPointNumbers.jl:58
[8] #1155
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:186
[9] _broadcast_getindex_evalf
@ ./broadcast.jl:673
[10] _broadcast_getindex
@ ./broadcast.jl:646
[11] getindex
@ ./broadcast.jl:605
[12] #34
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:59
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] StringMemory
@ ./iobuffer.jl:44
[3] _similar_data
@ ./iobuffer.jl:156
[4] _resize!
@ ./iobuffer.jl:317
[5] ensureroom
@ ./iobuffer.jl:401
[6] unsafe_write
@ ./iobuffer.jl:522
[7] unsafe_write
@ ./io.jl:452
[8] unsafe_write
@ ./io.jl:803
[9] write
@ ./io.jl:837
[10] show
@ ./ryu/Ryu.jl:118
[11] multiple call sites
@ unknown:0
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] unsafe_wrap
@ ./strings/string.jl:119
[2] StringMemory
@ ./iobuffer.jl:44
[3] _similar_data
@ ./iobuffer.jl:156
[4] _resize!
@ ./iobuffer.jl:317
[5] ensureroom
@ ./iobuffer.jl:401
[6] unsafe_write
@ ./iobuffer.jl:522
[7] unsafe_write
@ ./io.jl:452
[8] unsafe_write
@ ./io.jl:803
[9] write
@ ./io.jl:837
[10] show
@ ./ryu/Ryu.jl:118
[11] multiple call sites
@ unknown:0
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] StringMemory
@ ./iobuffer.jl:44
[3] _similar_data
@ ./iobuffer.jl:156
[4] ensureroom_slowpath
@ ./iobuffer.jl:371
[5] ensureroom
@ ./iobuffer.jl:396
[6] unsafe_write
@ ./iobuffer.jl:522
[7] unsafe_write
@ ./io.jl:452
[8] unsafe_write
@ ./io.jl:803
[9] write
@ ./io.jl:837
[10] show
@ ./ryu/Ryu.jl:118
[11] multiple call sites
@ unknown:0
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] unsafe_wrap
@ ./strings/string.jl:119
[2] StringMemory
@ ./iobuffer.jl:44
[3] _similar_data
@ ./iobuffer.jl:156
[4] ensureroom_slowpath
@ ./iobuffer.jl:371
[5] ensureroom
@ ./iobuffer.jl:396
[6] unsafe_write
@ ./iobuffer.jl:522
[7] unsafe_write
@ ./io.jl:452
[8] unsafe_write
@ ./io.jl:803
[9] write
@ ./io.jl:837
[10] show
@ ./ryu/Ryu.jl:118
[11] multiple call sites
@ unknown:0
Reason: unsupported call to an unknown function (call to jl_alloc_genericmemory)
Stacktrace:
[1] GenericMemory
@ ./boot.jl:516
[2] array_new_memory
@ ./Base.jl:335
[3] #133
@ ./array.jl:1129
[4] _growend!
@ ./array.jl:1116
[5] resize!
@ ./array.jl:1450
[6] show
@ ./ryu/Ryu.jl:118
[7] multiple call sites
@ unknown:0
Reason: unsupported call to an external C function (call to ijl_alloc_string)
Stacktrace:
[1] _string_n
@ ./strings/string.jl:109
[2] array_new_memory
@ ./Base.jl:331
[3] #133
@ ./array.jl:1129
[4] _growend!
@ ./array.jl:1116
[5] resize!
@ ./array.jl:1450
[6] show
@ ./ryu/Ryu.jl:118
[7] multiple call sites
@ unknown:0
Reason: unsupported call to a lazy-initialized function (call to jl_string_to_genericmemory)
Stacktrace:
[1] array_new_memory
@ ./Base.jl:332
[2] #133
@ ./array.jl:1129
[3] _growend!
@ ./array.jl:1116
[4] resize!
@ ./array.jl:1450
[5] show
@ ./ryu/Ryu.jl:118
[6] multiple call sites
@ unknown:0
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
[1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/validation.jl:147
[2] macro expansion
@ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:382 [inlined]
[3] macro expansion
@ ~/.julia/packages/TimerOutputs/6KVfH/src/TimerOutput.jl:253 [inlined]
[4] macro expansion
@ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:381 [inlined]
[5] emit_llvm(job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, only_entry::Bool)
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:108
[6] emit_llvm
@ ~/.julia/packages/GPUCompiler/2CW9L/src/utils.jl:106 [inlined]
[7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; toplevel::Bool, libraries::Bool, optimize::Bool, cleanup::Bool, validate::Bool, strip::Bool, only_entry::Bool, parent_job::Nothing)
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:100
[8] codegen
@ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:82 [inlined]
[9] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:79
[10] compile
@ ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:74 [inlined]
[11] #1145
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:250 [inlined]
[12] JuliaContext(f::CUDA.var"#1145#1148"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:34
[13] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/driver.jl:25
[14] compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/2kjXI/src/compiler/compilation.jl:249
[15] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:237
[16] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/2CW9L/src/execution.jl:151
[17] macro expansion
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:380 [inlined]
[18] macro expansion
@ ./lock.jl:273 [inlined]
[19] cufunction(f::GPUArrays.var"#34#36", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Q24f7, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Tuple{Base.OneTo{Int64}}, CUDA.var"#1155#1156"{Q24f7}, Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::@Kwargs{})
@ CUDA ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:375
[20] cufunction
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:372 [inlined]
[21] macro expansion
@ ~/.julia/packages/CUDA/2kjXI/src/compiler/execution.jl:112 [inlined]
[22] #launch_heuristic#1200
@ ~/.julia/packages/CUDA/2kjXI/src/gpuarrays.jl:17 [inlined]
[23] launch_heuristic
@ ~/.julia/packages/CUDA/2kjXI/src/gpuarrays.jl:15 [inlined]
[24] _copyto!
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:78 [inlined]
[25] copyto!
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:44 [inlined]
[26] copy
@ ~/.julia/packages/GPUArrays/qt4ax/src/host/broadcast.jl:29 [inlined]
[27] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1, CUDA.DeviceMemory}, Nothing, Type{Q24f7}, Tuple{CuArray{Float32, 1, CUDA.DeviceMemory}}})
@ Base.Broadcast ./broadcast.jl:867
[28] top-level scope
@ /data/ALOS/rosamond/julia_backproject/fp_gpu.jl:9
in expression starting at /data/ALOS/rosamond/julia_backproject/fp_gpu.jl:9
I've had trouble with the converting constructors for float Fixed{T,f}(v::Float64)
and fixed-point Fixed{T,f}(v::Fixed{T2,f2})
types on GPU. The ctor with the dummy parameter Fixed{T,f}(v::Integer, _)
seems to work fine but is pretty low-level. Personally I'm okay doing the float conversion on CPU and copying to GPU, but the fixed-to-fixed conversions need to happen on the device throughout my algorithm.
I'm pretty new to both this package and CUDA.jl so I don't have a good sense of how to proceed or how much work would be involved.