Skip to content

select, transform, combine not working on a DTable which is part of GDTable #68

@schlichtanders

Description

@schlichtanders

because select, transform, and combine do not work on GDTables directly, the intuitive workaround is to apply them separately onto the different DTables which are part of GDTable.

This however fails because of some convert issue which seems to be an implementation detail.

using Distributed
# add two further julia processes which could run on other machines
addprocs(2, exeflags="--threads=2")
# Distributed.@everywhere execute code on all machines
@everywhere using Dagger
# Dagger uses both Threads and Machines as processes
Dagger.all_processors()

using DTables, DataFrames, CSV, Statistics

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
files = [url, url, url, url, url]

d = DTable(DataFrame  CSV.File  download, files)
g = DTables.groupby(d, :species)

e = first(g)[2]
combine(fetch(e), :sepal_width => mean)  # works
combine(e, :sepal_width => mean)  # fails

the error is as below

julia> combine(e, :sepal_width => mean)  # fails
ERROR: ThunkFailedException:
Root Exception Type: RemoteException
Root Exception:
On worker 2:
MethodError: Cannot `convert` an object of type Int64 to an object of type Tuple{SentinelArrays.IndexIterator{Vector{Float64}}, Tuple{Vector{Vector{Float64}}, Int64, Vector{Float64}, Int64, Int64, Int64}}

Closest candidates are:
convert(::Type{T}, ::T) where T<:Tuple
@ Base essentials.jl:456
convert(::Type{T}, ::T) where T
@ Base Base.jl:84
convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple}
@ Base essentials.jl:457
...

Stacktrace:
[1] cvt1
@ ./essentials.jl:468 [inlined]
[2] ntuple
@ ./ntuple.jl:49 [inlined]
[3] convert
@ ./essentials.jl:470
[4] convert
@ ./some.jl:37
[5] setproperty!
@ ./Base.jl:40
[6] pull_next_chunk!
@ ~/.julia/packages/DTables/EiSy4/src/table/dtable_column.jl:53
[7] iterate
@ ~/.julia/packages/DTables/EiSy4/src/table/dtable_column.jl:67
[8] mean
@ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:62
[9] mean
@ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:44
[10] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[11] invokelatest
@ ./essentials.jl:889 [inlined]
[12] #41
@ ~/.julia/packages/Dagger/Tx54v/src/threadproc.jl:20
Stacktrace:
[1] wait
@ ./task.jl:352 [inlined]
[2] fetch
@ ./task.jl:372 [inlined]
[3] #execute!#40
@ ~/.julia/packages/Dagger/Tx54v/src/threadproc.jl:26
[4] execute!
@ ~/.julia/packages/Dagger/Tx54v/src/threadproc.jl:13
[5] #167
@ ~/.julia/packages/Dagger/Tx54v/src/sch/Sch.jl:1611 [inlined]
[6] #21
@ ~/.julia/packages/Dagger/Tx54v/src/options.jl:18 [inlined]
[7] #1
@ ~/.julia/packages/ScopedValues/Kvcrb/src/ScopedValues.jl:163
[8] with_logstate
@ ./logging.jl:515
[9] with_logger
@ ./logging.jl:627 [inlined]
[10] enter_scope
@ ~/.julia/packages/ScopedValues/Kvcrb/src/payloadlogger.jl:17 [inlined]
[11] with
@ ~/.julia/packages/ScopedValues/Kvcrb/src/ScopedValues.jl:162
[12] with_options
@ ~/.julia/packages/Dagger/Tx54v/src/options.jl:17
[13] do_task
@ ~/.julia/packages/Dagger/Tx54v/src/sch/Sch.jl:1609
[14] #143
@ ~/.julia/packages/Dagger/Tx54v/src/sch/Sch.jl:1302
Root Thunk:  Thunk(id=29, mean(DTableColumn{Float64, Tuple{Float64, Tuple{SentinelArrays.IndexIterator{Vector{Float64}}, Tuple{Vector{Vector{Float64}}, Int64, Vector{Float64}, Int64, Int64, Int64}}}}(DTable with 1 partitions
Tabletype: unknown (use `tabletype!(::DTable)`), 2, :sepal_width, [250], 0, nothing, nothing)))
Inner Thunk: Thunk(id=30, length(Thunk[29](mean, ...)))
This Thunk:  Thunk(id=30, length(Thunk[29](mean, ...)))
Stacktrace:
[1] fetch(t::Dagger.ThunkFuture; proc::OSProc, raw::Bool)
@ Dagger ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:16
[2] fetch
@ ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:11 [inlined]
[3] #fetch#73
@ ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:58 [inlined]
[4] fetch
@ ~/.julia/packages/Dagger/Tx54v/src/eager_thunk.jl:54 [inlined]
[5] _manipulate(df::DTable, normalized_cs::Vector{Any}, copycols::Bool, keeprows::Bool)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:97
[6] manipulate(df::DTable, cs::Any; copycols::Bool, keeprows::Bool, renamecols::Bool)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:47
[7] combine(df::DTable, args::Any; renamecols::Bool, threads::Bool)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:190
[8] combine(df::DTable, args::Any)
@ DTables ~/.julia/packages/DTables/EiSy4/src/operations/dataframes_interface.jl:189
[9] top-level scope
@ REPL[15]:1

EDIT:
The reduce function works luckily, but is not that convenient, especially for mean.

julia> fetch(reduce(+, e, cols=[:sepal_width])).sepal_width / length(e)
2.974

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions