Skip to content

Commit 0a0032c

Browse files
jlchanpiever
andauthored
Expand documentation, add discussion on counterintuitive behavior (#188)
* Standardizing markdown sections in README converting ### to ## * splitting README material into documenter sections * add StaticArray example * fix typo * mentioning on-the-fly construction of StructArray entries in overview.md * discussing mutability for counterintuitive behaviors * adding counterintuitive behavior docs to make.jl * adding an extra initialization section * setting "Overview" as the default doc homepage moving index.md to reference.md, moving overview.md to index.md (and deleting overview.md) * Apply suggestions from code review Co-authored-by: Pietro Vertechi <pietro.vertechi@protonmail.com> * removing make.jl TODOs * Update docs/src/counterintuitive.md Co-authored-by: Pietro Vertechi <pietro.vertechi@protonmail.com> Co-authored-by: Jesse Chan <jesse.chan@rice.edu> Co-authored-by: Pietro Vertechi <pietro.vertechi@protonmail.com>
1 parent e0b70ac commit 0a0032c

File tree

7 files changed

+524
-43
lines changed

7 files changed

+524
-43
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ julia> StructArray([1+im, 3-2im])
4343
3 - 2im
4444
```
4545

46-
### Collection and initialization
46+
## Collection and initialization
4747

4848
One can also create a `StructArray` from an iterable of structs without creating an intermediate `Array`:
4949

@@ -76,7 +76,7 @@ julia> rand!(s)
7676
0.92407+0.929336im 0.267358+0.804478im
7777
```
7878

79-
### Using custom array types
79+
## Using custom array types
8080

8181
StructArrays supports using custom array types. It is always possible to pass field arrays of a custom type. The "custom array of structs to struct of custom arrays" transformation will use the `similar` method of the custom array type. This can be useful when working on the GPU for example:
8282

@@ -153,7 +153,7 @@ julia> push!(t, (a = 3, b = "z"))
153153
(a = 3, b = "z")
154154
```
155155

156-
### Lazy row iteration
156+
## Lazy row iteration
157157

158158
StructArrays also provides a `LazyRow` wrapper for lazy row iteration. `LazyRow(t, i)` does not materialize the i-th row but returns a lazy wrapper around it on which `getproperty` does the correct thing. This is useful when the row has many fields only some of which are necessary. It also allows changing columns in place.
159159

docs/make.jl

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,14 @@ using StructArrays
66
makedocs(
77
sitename = "StructArrays",
88
format = Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true"),
9-
modules = [StructArrays]
9+
modules = [StructArrays],
10+
pages = [
11+
"Overview"=>"index.md",
12+
"Example usage"=>"examples.md",
13+
"Some counterintuitive behaviors"=>"counterintuitive.md",
14+
"Advanced techniques"=>"advanced.md",
15+
"Index"=>"reference.md",
16+
]
1017
)
1118

1219
# Documenter can also automatically deploy documentation to gh-pages.

docs/src/advanced.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Advanced techniques
2+
3+
## Structures with non-standard data layout
4+
5+
StructArrays support structures with custom data layout. The user is required to overload `staticschema` in order to define the custom layout, `component` to access fields of the custom layout, and `createinstance(T, fields...)` to create an instance of type `T` from its custom fields `fields`. In other word, given `x::T`, `createinstance(T, (component(x, f) for f in fieldnames(staticschema(T)))...)` should successfully return an instance of type `T`.
6+
7+
Here is an example of a type `MyType` that has as custom fields either its field `data` or fields of its field `rest` (which is a named tuple):
8+
9+
```julia
10+
using StructArrays
11+
12+
struct MyType{T, NT<:NamedTuple}
13+
data::T
14+
rest::NT
15+
end
16+
17+
MyType(x; kwargs...) = MyType(x, values(kwargs))
18+
19+
function StructArrays.staticschema(::Type{MyType{T, NamedTuple{names, types}}}) where {T, names, types}
20+
return NamedTuple{(:data, names...), Base.tuple_type_cons(T, types)}
21+
end
22+
23+
function StructArrays.component(m::MyType, key::Symbol)
24+
return key === :data ? getfield(m, 1) : getfield(getfield(m, 2), key)
25+
end
26+
27+
# generate an instance of MyType type
28+
function StructArrays.createinstance(::Type{MyType{T, NT}}, x, args...) where {T, NT}
29+
return MyType(x, NT(args))
30+
end
31+
32+
s = [MyType(rand(), a=1, b=2) for i in 1:10]
33+
StructArray(s)
34+
```
35+
36+
In the above example, our `MyType` was composed of `data` of type `Float64` and `rest` of type `NamedTuple`. In many practical cases where there are custom types involved it's hard for StructArrays to automatically widen the types in case they are heterogeneous. The following example demonstrates a widening method in that scenario.
37+
38+
```julia
39+
using Tables
40+
41+
# add a source of custom type data
42+
struct Location{U}
43+
x::U
44+
y::U
45+
end
46+
struct Region{V}
47+
area::V
48+
end
49+
50+
s1 = MyType(Location(1, 0), place = "Delhi", rainfall = 200)
51+
s2 = MyType(Location(2.5, 1.9), place = "Mumbai", rainfall = 1010)
52+
s3 = MyType(Region([Location(1, 0), Location(2.5, 1.9)]), place = "North India", rainfall = missing)
53+
54+
s = [s1, s2, s3]
55+
# Now if we try to do StructArray(s)
56+
# we will get an error
57+
58+
function meta_table(iter)
59+
cols = Tables.columntable(iter)
60+
meta_table(first(cols), Base.tail(cols))
61+
end
62+
63+
function meta_table(data, rest::NT) where NT<:NamedTuple
64+
F = MyType{eltype(data), StructArrays.eltypes(NT)}
65+
return StructArray{F}(; data=data, rest...)
66+
end
67+
68+
meta_table(s)
69+
```
70+
71+
The above strategy has been tested and implemented in [GeometryBasics.jl](https://github.com/JuliaGeometry/GeometryBasics.jl).
72+
73+
## Mutate-or-widen style accumulation
74+
75+
StructArrays provides a function `StructArrays.append!!(dest, src)` (unexported) for "mutate-or-widen" style accumulation. This function can be used via [`BangBang.append!!`](https://juliafolds.github.io/BangBang.jl/dev/#BangBang.append!!) and [`BangBang.push!!`](https://juliafolds.github.io/BangBang.jl/dev/#BangBang.push!!) as well.
76+
77+
`StructArrays.append!!` works like `append!(dest, src)` if `dest` can contain all element types in `src` iterator; i.e., it _mutates_ `dest` in-place:
78+
79+
```julia
80+
julia> dest = StructVector((a=[1], b=[2]))
81+
1-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Int64}}:
82+
(a = 1, b = 2)
83+
84+
julia> StructArrays.append!!(dest, [(a = 3, b = 4)])
85+
2-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Int64}}:
86+
(a = 1, b = 2)
87+
(a = 3, b = 4)
88+
89+
julia> ans === dest
90+
true
91+
```
92+
93+
Unlike `append!`, `append!!` can also _widen_ element type of `dest` array:
94+
95+
```julia
96+
julia> StructArrays.append!!(dest, [(a = missing, b = 6)])
97+
3-element StructArray(::Array{Union{Missing, Int64},1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}:
98+
NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}((1, 2))
99+
NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}((3, 4))
100+
NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}((missing, 6))
101+
102+
julia> ans === dest
103+
false
104+
```
105+
106+
Since the original array `dest` cannot hold the input, a new array is created (`ans !== dest`).
107+
108+
Combined with [function barriers](https://docs.julialang.org/en/latest/manual/performance-tips/#kernel-functions-1), `append!!` is a useful building block for implementing `collect`-like functions.
109+
110+
## Using StructArrays in CUDA kernels
111+
112+
It is possible to combine StructArrays with [CUDAnative](https://github.com/JuliaGPU/CUDAnative.jl), in order to create CUDA kernels that work on StructArrays directly on the GPU. Make sure you are familiar with the CUDAnative documentation (esp. kernels with plain `CuArray`s) before experimenting with kernels based on `StructArray`s.
113+
114+
```julia
115+
using CUDAnative, CuArrays, StructArrays
116+
d = StructArray(a = rand(100), b = rand(100))
117+
118+
# move to GPU
119+
dd = replace_storage(CuArray, d)
120+
de = similar(dd)
121+
122+
# a simple kernel, to copy the content of `dd` onto `de`
123+
function kernel!(dest, src)
124+
i = (blockIdx().x-1)*blockDim().x + threadIdx().x
125+
if i <= length(dest)
126+
dest[i] = src[i]
127+
end
128+
return nothing
129+
end
130+
131+
threads = 1024
132+
blocks = cld(length(dd),threads)
133+
134+
@cuda threads=threads blocks=blocks kernel!(de, dd)
135+
```
136+

docs/src/counterintuitive.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Some counterintuitive behaviors
2+
3+
StructArrays doesn't explicitly store any structs; rather, it materializes a struct element on the fly when `getindex` is called. This is typically very efficient; for example, if all the struct fields are `isbits`, then materializing a new struct does not allocate. However, this can lead to counterintuitive behavior when modifying entries of a StructArray.
4+
5+
## Modifying the field of a struct element
6+
7+
```julia
8+
julia> mutable struct Foo{T}
9+
a::T
10+
b::T
11+
end
12+
13+
julia> x = StructArray([Foo(1,2) for i = 1:5])
14+
15+
julia> x[1].a = 10
16+
17+
julia> x # remains unchanged
18+
5-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype Foo{Int64}:
19+
Foo{Int64}(1, 2)
20+
Foo{Int64}(1, 2)
21+
Foo{Int64}(1, 2)
22+
Foo{Int64}(1, 2)
23+
Foo{Int64}(1, 2)
24+
```
25+
The assignment `x[1].a = 10` first calls `getindex(x,1)`, then sets property `a` of the accessed element. However, since StructArrays constructs `Foo(x.a[1],x.b[1])` on the fly when when accessing `x[1]`, setting `x[1].a = 10` modifies the materialized struct rather than the StructArray `x`.
26+
27+
Note that one can modify a field of a StructArray entry via `x.a[1] = 10` (the order of `getproperty` and `getindex` matters). As an added benefit, this does not require that the struct `Foo` is mutable, as it modifies the underlying component array `x.a` directly.
28+
29+
For mutable structs, it is possible to write code that works for both regular `Array`s and `StructArray`s with the following trick:
30+
```julia
31+
x[1] = x[1].a = 10
32+
```
33+
34+
`x[1].a = 10` creates a new `Foo` element, modifies the field `a`, then returns the modified struct. Assigning this to `x[1]` then unpacks `a` and `b` from the modified struct and assigns entries of the component arrays `x.a[1] = a`, `x.b[1] = b`.
35+
36+
## Broadcasted assignment for array entries
37+
38+
Broadcasted in-place assignment can also behave counterintuitively for StructArrays.
39+
```julia
40+
julia> mutable struct Bar{T} <: FieldVector{2,T}
41+
a::T
42+
b::T
43+
end
44+
45+
julia> x = StructArray([Bar(1,2) for i = 1:5])
46+
5-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype Bar{Int64}:
47+
[1, 2]
48+
[1, 2]
49+
[1, 2]
50+
[1, 2]
51+
[1, 2]
52+
53+
julia> x[1] .= 1
54+
2-element Bar{Int64} with indices SOneTo(2):
55+
1
56+
1
57+
58+
julia> x
59+
5-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype Bar{Int64}:
60+
[1, 2]
61+
[1, 2]
62+
[1, 2]
63+
[1, 2]
64+
[1, 2]
65+
```
66+
Because setting `x[1] .= 1` creates a `Bar` struct first, broadcasted assignment modifies this new materialized struct rather than the StructArray `x`. Note, however, that `x[1] = x[1] .= 1` works, since it assigns the modified materialized struct to the first entry of `x`.
67+
68+
## Mutable struct types
69+
70+
Each of these counterintuitive behaviors occur when using StructArrays with mutable elements. However, since the component arrays of a StructArray are generally mutable even if its entries are immutable, a StructArray with immutable elements will in many cases behave identically to (but be more efficient than) a StructArray with mutable elements. Thus, it is recommended to use immutable structs with StructArray whenever possible.

docs/src/examples.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
## Example usage to store complex numbers
2+
3+
```julia
4+
julia> using StructArrays, Random
5+
6+
julia> Random.seed!(4);
7+
8+
julia> s = StructArray{ComplexF64}((rand(2,2), rand(2,2)))
9+
2×2 StructArray(::Array{Float64,2}, ::Array{Float64,2}) with eltype Complex{Float64}:
10+
0.680079+0.625239im 0.92407+0.267358im
11+
0.874437+0.737254im 0.929336+0.804478im
12+
13+
julia> s[1, 1]
14+
0.680079235935741 + 0.6252391193298537im
15+
16+
julia> s.re
17+
2×2 Array{Float64,2}:
18+
0.680079 0.92407
19+
0.874437 0.929336
20+
21+
julia> StructArrays.components(s) # obtain all field arrays as a named tuple
22+
(re = [0.680079 0.92407; 0.874437 0.929336], im = [0.625239 0.267358; 0.737254 0.804478])
23+
```
24+
25+
Note that the same approach can be used directly from an `Array` of complex numbers:
26+
27+
```julia
28+
julia> StructArray([1+im, 3-2im])
29+
2-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype Complex{Int64}:
30+
1 + 1im
31+
3 - 2im
32+
```
33+
34+
## Example usage to store a data table
35+
36+
```julia
37+
julia> t = StructArray((a = [1, 2], b = ["x", "y"]))
38+
2-element StructArray(::Array{Int64,1}, ::Array{String,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,String}}:
39+
(a = 1, b = "x")
40+
(a = 2, b = "y")
41+
42+
julia> t[1]
43+
(a = 1, b = "x")
44+
45+
julia> t.a
46+
2-element Array{Int64,1}:
47+
1
48+
2
49+
50+
julia> push!(t, (a = 3, b = "z"))
51+
3-element StructArray(::Array{Int64,1}, ::Array{String,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,String}}:
52+
(a = 1, b = "x")
53+
(a = 2, b = "y")
54+
(a = 3, b = "z")
55+
```
56+
57+
## Example usage with StaticArray elements
58+
59+
```julia
60+
julia> using StructArrays, StaticArrays
61+
62+
julia> x = StructArray([SVector{2}(1,2) for i = 1:5])
63+
5-element StructArray(::Vector{Tuple{Int64, Int64}}) with eltype SVector{2, Int64}:
64+
[1, 2]
65+
[1, 2]
66+
[1, 2]
67+
[1, 2]
68+
[1, 2]
69+
70+
julia> A = StructArray([SMatrix{2,2}([1 2;3 4]) for i = 1:5])
71+
5-element StructArray(::Vector{NTuple{4, Int64}}) with eltype SMatrix{2, 2, Int64, 4}:
72+
[1 2; 3 4]
73+
[1 2; 3 4]
74+
[1 2; 3 4]
75+
[1 2; 3 4]
76+
[1 2; 3 4]
77+
78+
julia> B = StructArray([SArray{Tuple{2,2,2}}(reshape(1:8,2,2,2)) for i = 1:5]); B[1]
79+
2×2×2 SArray{Tuple{2, 2, 2}, Int64, 3, 8} with indices SOneTo(2)×SOneTo(2)×SOneTo(2):
80+
[:, :, 1] =
81+
1 3
82+
2 4
83+
84+
[:, :, 2] =
85+
5 7
86+
6 8
87+
```

0 commit comments

Comments
 (0)