-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
because select
, transform
, and combine
do not work on GDTables directly, the intuitive workaround is to apply them separately onto the different DTables which are part of GDTable.
This however fails because of some convert issue which seems to be an implementation detail.
using Distributed
# add two further julia processes which could run on other machines
addprocs(2, exeflags="--threads=2")
# Distributed.@everywhere execute code on all machines
@everywhere using Dagger
# Dagger uses both Threads and Machines as processes
Dagger.all_processors()
using DTables, DataFrames, CSV, Statistics
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
files = [url, url, url, url, url]
d = DTable(DataFrame ∘ CSV.File ∘ download, files)
g = DTables.groupby(d, :species)
e = first(g)[2]
combine(fetch(e), :sepal_width => mean) # works
combine(e, :sepal_width => mean) # fails
the error is as below
julia> combine(e, :sepal_width => mean) # fails
ERROR: ThunkFailedException:
Root Exception Type: RemoteException
Root Exception:
On worker 2:
MethodError: Cannot `convert` an object of type Int64 to an object of type Tuple{SentinelArrays.IndexIterator{Vector{Float64}}, Tuple{Vector{Vector{Float64}}, Int64, Vector{Float64}, Int64, Int64, Int64}}
Closest candidates are:
convert(::Type{T}, ::T) where T<:Tuple
@ Base essentials.jl:456
convert(::Type{T}, ::T) where T
@ Base Base.jl:84
convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple}
@ Base essentials.jl:457
...
Stacktrace:
[1] cvt1
@ ./essentials.jl:468 [inlined]
[2] ntuple
@ ./ntuple.jl:49 [inlined]
[3] convert
@ ./essentials.jl:470
[4] convert
@ ./some.jl:37
[5] setproperty!
@ ./Base.jl:40
[6] pull_next_chunk!
@ ~/.julia/packages/DTables/EiSy4/src/table/dtable_column.jl:53
[7] iterate
@ ~/.julia/packages/DTables/EiSy4/src/table/dtable_column.jl:67
[8] mean
@ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:62
[9] mean
@ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:44
[10] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[11] invokelatest
@ ./essentials.jl:889 [inlined]
[12] #41
@ ~/.julia/packages/Dagger/Tx54v/src/threadproc.jl:20
Stacktrace:
[1] wait
@ ./task.jl:352 [inlined]
[2] fetch
@ ./task.jl:372 [inlined]
[3] #execute!#40
@ ~/.julia/packages/Dagger/Tx54v/src/threadproc.jl:26
[4] execute!
@ ~/.julia/packages/Dagger/Tx54v/src/threadproc.jl:13
[5] #167
@ ~/.julia/packages/Dagger/Tx54v/src/sch/Sch.jl:1611 [inlined]
[6] #21
@ ~/.julia/packages/Dagger/Tx54v/src/options.jl:18 [inlined]
[7] #1
@ ~/.julia/packages/ScopedValues/Kvcrb/src/ScopedValues.jl:163
[8] with_logstate
@ ./logging.jl:515
[9] with_logger
@ ./logging.jl:627 [inlined]
[10] enter_scope
@ ~/.julia/packages/ScopedValues/Kvcrb/src/payloadlogger.jl:17 [inlined]
[11] with
@ ~/.julia/packages/ScopedValues/Kvcrb/src/ScopedValues.jl:162
[12] with_options
@ ~/.julia/packages/Dagger/Tx54v/src/options.jl:17
[13] do_task
@ ~/.julia/packages/Dagger/Tx54v/src/sch/Sch.jl:1609
[14] #143
@ ~/.julia/packages/Dagger/Tx54v/src/sch/Sch.jl:1302
Root Thunk: Thunk(id=29, mean(DTableColumn{Float64, Tuple{Float64, Tuple{SentinelArrays.IndexIterator{Vector{Float64}}, Tuple{Vector{Vector{Float64}}, Int64, Vector{Float64}, Int64, Int64, Int64}}}}(DTable with 1 partitions
Tabletype: unknown (use `tabletype!(::DTable)`), 2, :sepal_width, [250], 0, nothing, nothing)))
Inner Thunk: Thunk(id=30, length(Thunk[29](mean, ...)))
This Thunk: Thunk(id=30, length(Thunk[29](mean, ...)))
Stacktrace:
[1] fetch(t::Dagger.ThunkFuture; proc::OSProc, raw::Bool)
@ Dagger ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:16
[2] fetch
@ ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:11 [inlined]
[3] #fetch#73
@ ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:58 [inlined]
[4] fetch
@ ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:54 [inlined]
[5] _manipulate(df::DTable, normalized_cs::Vector{Any}, copycols::Bool, keeprows::Bool)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:97
[6] manipulate(df::DTable, cs::Any; copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:47
[7] combine(df::DTable, args::Any; renamecols::Bool, threads::Bool)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:190
[8] combine(df::DTable, args::Any)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:189
[9] top-level scope
@ REPL[15]:1
EDIT:
The reduce
function works luckily, but is not that convenient, especially for mean.
julia> fetch(reduce(+, e, cols=[:sepal_width])).sepal_width / length(e)
2.974
Metadata
Metadata
Assignees
Labels
No labels