Skip to content

Commit 1632bff

Browse files
paldaynalimilankleinschmidtararslan
authored
termnames (#299)
* termnames * use StatsAPI directly, deprecate old termnames * deprecations --------- Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> Co-authored-by: Dave Kleinschmidt <dave.f.kleinschmidt@gmail.com> Co-authored-by: Alex Arslan <ararslan@comcast.net>
1 parent 4a7d159 commit 1632bff

14 files changed

+229
-104
lines changed

Project.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name = "StatsModels"
22
uuid = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
3-
version = "0.7.2"
3+
version = "0.7.3"
44

55
[deps]
66
DataAPI = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
@@ -10,6 +10,7 @@ Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
1010
REPL = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
1111
ShiftedArrays = "1277b4bf-5013-50f5-be3d-901d8477a67a"
1212
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
13+
StatsAPI = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
1314
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
1415
StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
1516
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
@@ -21,6 +22,7 @@ DataAPI = "1.1"
2122
DataFrames = "1"
2223
DataStructures = "0.17, 0.18"
2324
ShiftedArrays = "1, 2"
25+
StatsAPI = "1"
2426
StatsBase = "0.33.5, 0.34"
2527
StatsFuns = "0.9, 1.0"
2628
Tables = "0.2, 1"

docs/Project.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
33
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
44
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
55
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
6+
StatsAPI = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
67
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
78
StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
89
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"

docs/src/api.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ end
1515
term
1616
coefnames
1717
modelcols
18+
termnames
1819
```
1920

2021
### Higher-order terms

docs/src/internals.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ FormulaTerm{Term, Term}
8080
```
8181

8282
!!! note
83-
83+
8484
As always, you can introspect which method is called with
8585

8686
```julia
@@ -395,7 +395,7 @@ possible to use an existing function, the best practice is to define a new
395395
function to make dispatch less ambiguous.
396396

397397
```jldoctest 1
398-
using StatsBase
398+
using StatsAPI
399399
# syntax: best practice to define a _new_ function
400400
poly(x, n) = x^n
401401
@@ -444,7 +444,7 @@ StatsModels.termvars(p::PolyTerm) = StatsModels.termvars(p.term)
444444
# number of columns in the matrix this term produces
445445
StatsModels.width(p::PolyTerm) = p.deg
446446
447-
StatsBase.coefnames(p::PolyTerm) = coefnames(p.term) .* "^" .* string.(1:p.deg)
447+
StatsAPI.coefnames(p::PolyTerm) = coefnames(p.term) .* "^" .* string.(1:p.deg)
448448
449449
# output
450450
@@ -558,9 +558,9 @@ PolyTerm{Term, ConstantTerm{Int64}}
558558
```
559559

560560
!!! note
561-
561+
562562
The functions like `poly` should be exported by the package that provides
563-
the special syntax for two reasons. First, it makes run-time term
563+
the special syntax for two reasons. First, it makes run-time term
564564
construction more convenient. Second, because of how the `@formula` macro
565565
generates code, the function that represents special syntax must be
566566
available in the namespace where `@formula` is _called_. This is because

src/StatsModels.jl

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,15 @@
11
module StatsModels
22

33
using Tables
4+
using StatsAPI
45
using StatsBase
56
using ShiftedArrays
67
using ShiftedArrays: lag, lead
78
using DataStructures
89
using DataAPI
910
using DataAPI: levels
1011
using Printf: @sprintf
12+
using StatsAPI: coefnames, fit, predict, dof
1113
using StatsFuns: chisqccdf
1214

1315
using SparseArrays
@@ -32,10 +34,11 @@ export
3234
HelmertCoding,
3335
SeqDiffCoding,
3436
HypothesisCoding,
35-
37+
3638
coefnames,
3739
setcontrasts!,
3840
formula,
41+
termnames,
3942

4043
AbstractTerm,
4144
ConstantTerm,
@@ -81,5 +84,6 @@ include("formula.jl")
8184
include("modelframe.jl")
8285
include("statsmodel.jl")
8386
include("lrtest.jl")
87+
include("deprecated.jl")
8488

8589
end # module StatsModels

src/contrasts.jl

Lines changed: 26 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ C(levels = ::Vector{Any}, base = ::Any) # specify levels and base
5353
mean of the lower levels
5454
* [`SeqDiffCoding`](@ref) - Code for differences between sequential levels of
5555
the variable.
56-
* [`HypothesisCoding`](@ref) - Manually specify contrasts via a hypothesis
56+
* [`HypothesisCoding`](@ref) - Manually specify contrasts via a hypothesis
5757
matrix, which gives the weighting for the average response for each level
5858
* [`StatsModels.ContrastsCoding`](@ref) - Manually specify contrasts matrix,
5959
which is directly copied into the model matrix.
@@ -79,15 +79,15 @@ The easiest way to specify custom contrasts is with `HypothesisCoding` or
7979
contrast coding system, you can subtype `AbstractContrasts`. This requires a
8080
constructor, a `contrasts_matrix` method for constructing the actual contrasts
8181
matrix that maps from levels to `ModelMatrix` column values, and (optionally) a
82-
`termnames` method:
82+
`coefnames` method:
8383
8484
```julia
8585
mutable struct MyCoding <: AbstractContrasts
8686
...
8787
end
8888
8989
contrasts_matrix(C::MyCoding, baseind, n) = ...
90-
termnames(C::MyCoding, levels, baseind) = ...
90+
coefnames(C::MyCoding, levels, baseind) = ...
9191
```
9292
9393
# References
@@ -103,30 +103,32 @@ abstract type AbstractContrasts end
103103
# Contrasts + Levels (usually from data) = ContrastsMatrix
104104
struct ContrastsMatrix{C <: AbstractContrasts, M <: AbstractMatrix, T, U}
105105
matrix::M
106-
termnames::Vector{U}
106+
coefnames::Vector{U}
107107
levels::Vector{T}
108108
contrasts::C
109109
invindex::Dict{T,Int}
110110
function ContrastsMatrix(matrix::M,
111-
termnames::Vector{U},
111+
coefnames::Vector{U},
112112
levels::Vector{T},
113113
contrasts::C) where {U, T, C <: AbstractContrasts, M <: AbstractMatrix}
114114
allunique(levels) || throw(ArgumentError("levels must be all unique, got $(levels)"))
115115
invindex = Dict{T,Int}(x=>i for (i,x) in enumerate(levels))
116-
new{C,M,T,U}(matrix, termnames, levels, contrasts, invindex)
116+
new{C,M,T,U}(matrix, coefnames, levels, contrasts, invindex)
117117
end
118118
end
119119

120-
# only check equality of matrix, termnames, and levels, and that the type is the
120+
StatsAPI.coefnames(cm::ContrastsMatrix) = cm.coefnames
121+
122+
# only check equality of matrix, coefnames, and levels, and that the type is the
121123
# same for the contrasts (values are irrelevant). This ensures that the two
122124
# will behave identically in creating modelmatrix columns
123125
Base.:(==)(a::ContrastsMatrix{C}, b::ContrastsMatrix{C}) where {C<:AbstractContrasts} =
124126
a.matrix == b.matrix &&
125-
a.termnames == b.termnames &&
127+
a.coefnames == b.coefnames &&
126128
a.levels == b.levels
127129

128130
Base.hash(a::ContrastsMatrix{C}, h::UInt) where {C} =
129-
hash(C, hash(a.matrix, hash(a.termnames, hash(a.levels, h))))
131+
hash(C, hash(a.matrix, hash(a.coefnames, hash(a.levels, h))))
130132

131133
"""
132134
An instantiation of a contrast coding system for particular levels
@@ -166,7 +168,7 @@ function ContrastsMatrix(contrasts::C, levels::AbstractVector{T}) where {C<:Abst
166168
# 3. contrast levels missing from data: would have empty columns, generate a
167169
# rank-deficient model matrix.
168170
c_levels = something(DataAPI.levels(contrasts), levels)
169-
171+
170172
mismatched_levels = symdiff(c_levels, levels)
171173
if !isempty(mismatched_levels)
172174
throw(ArgumentError("contrasts levels not found in data or vice-versa: " *
@@ -198,7 +200,7 @@ function ContrastsMatrix(contrasts::C, levels::AbstractVector{T}) where {C<:Abst
198200
"$c_levels."))
199201
end
200202

201-
tnames = termnames(contrasts, c_levels, baseind)
203+
tnames = coefnames(contrasts, c_levels, baseind)
202204

203205
mat = contrasts_matrix(contrasts, baseind, n)
204206

@@ -224,7 +226,7 @@ function ContrastsMatrix(c::ContrastsMatrix, levels::AbstractVector)
224226
return c
225227
end
226228

227-
function termnames(C::AbstractContrasts, levels::AbstractVector, baseind::Integer)
229+
function StatsAPI.coefnames(C::AbstractContrasts, levels::AbstractVector, baseind::Integer)
228230
not_base = [1:(baseind-1); (baseind+1):length(levels)]
229231
levels[not_base]
230232
end
@@ -233,7 +235,7 @@ Base.getindex(contrasts::ContrastsMatrix, rowinds, colinds) =
233235
getindex(contrasts.matrix, getindex.(Ref(contrasts.invindex), rowinds), colinds)
234236

235237
# Making a contrast type T only requires that there be a method for
236-
# contrasts_matrix(T, baseind, n) and optionally termnames(T, levels, baseind)
238+
# contrasts_matrix(T, baseind, n) and optionally coefnames(T, levels, baseind)
237239
# The rest is boilerplate.
238240
for contrastType in [:DummyCoding, :EffectsCoding, :HelmertCoding]
239241
@eval begin
@@ -254,7 +256,7 @@ DataAPI.levels(c::AbstractContrasts) = nothing
254256
FullDummyCoding()
255257
256258
Full-rank dummy coding generates one indicator (1 or 0) column for each level,
257-
**including** the base level. This is sometimes known as
259+
**including** the base level. This is sometimes known as
258260
[one-hot encoding](https://en.wikipedia.org/wiki/One-hot).
259261
260262
Not exported but included here for the sake of completeness.
@@ -331,7 +333,7 @@ column is generated with 1 where `variable .== x` and -1 where `variable .== bas
331333
of 0.
332334
333335
If `levels` are omitted or `nothing`, they are determined from the data
334-
by calling the `levels` function when constructing `ContrastsMatrix`.
336+
by calling the `levels` function when constructing `ContrastsMatrix`.
335337
If `base` is omitted or `nothing`, the first level is used as the base.
336338
337339
When all levels are equally frequent, effects coding generates model matrix
@@ -373,7 +375,7 @@ Helmert coding codes each level as the difference from the average of the lower
373375
levels.
374376
375377
If `levels` are omitted or `nothing`, they are determined from the data
376-
by calling the `levels` function when constructing `Contrastsmatrix`.
378+
by calling the `levels` function when constructing `Contrastsmatrix`.
377379
If `base` is omitted or `nothing`, the first level is used as the base.
378380
For each non-base level, Helmert coding generates a columns with -1 for each of
379381
n levels below, n for that level, and 0 above.
@@ -462,7 +464,7 @@ function contrasts_matrix(C::SeqDiffCoding, _, n)
462464
end
463465

464466
# TODO: consider customizing term names:
465-
# termnames(C::SeqDiffCoding, levels::AbstractVector, baseind::Integer) =
467+
# StatsAPI.coefnames(C::SeqDiffCoding, levels::AbstractVector, baseind::Integer) =
466468
# ["$(levels[i])-$(levels[i-1])" for i in 2:length(levels)]
467469

468470
"""
@@ -591,7 +593,7 @@ function contrasts_matrix(C::HypothesisCoding, baseind, n)
591593
C.contrasts
592594
end
593595

594-
termnames(C::HypothesisCoding, levels::AbstractVector, baseind::Int) =
596+
StatsAPI.coefnames(C::HypothesisCoding, levels::AbstractVector, baseind::Int) =
595597
something(C.labels, levels[1:length(levels) .!= baseind])
596598

597599
DataAPI.levels(c::HypothesisCoding) = c.levels
@@ -602,8 +604,8 @@ DataAPI.levels(c::HypothesisCoding) = c.levels
602604
603605
Coding by manual specification of contrasts matrix. For k levels, the contrasts
604606
must be a k by k-1 Matrix. The contrasts in this matrix will be copied directly
605-
into the model matrix; if you want to specify your contrasts as hypotheses (i.e.,
606-
weights assigned to each level's cell mean), you should use
607+
into the model matrix; if you want to specify your contrasts as hypotheses (i.e.,
608+
weights assigned to each level's cell mean), you should use
607609
[`HypothesisCoding`](@ref) instead.
608610
"""
609611
mutable struct ContrastsCoding{T<:AbstractMatrix} <: AbstractContrasts
@@ -687,9 +689,9 @@ julia> StatsModels.hypothesis_matrix(cmat)
687689
-1 0 0 1
688690
```
689691
690-
For non-centered contrasts like `DummyCoding`, without including the intercept
691-
the hypothesis matrix is incorrect. So while `intercept=true` is the default for
692-
non-centered contrasts, you can see the (wrong) hypothesis matrix when ignoring
692+
For non-centered contrasts like `DummyCoding`, without including the intercept
693+
the hypothesis matrix is incorrect. So while `intercept=true` is the default for
694+
non-centered contrasts, you can see the (wrong) hypothesis matrix when ignoring
693695
it by forcing `intercept=false`:
694696
695697
```jldoctest hypmat
@@ -710,7 +712,7 @@ julia> StatsModels.hypothesis_matrix(cmat, tolerance=0) # ugly
710712
1.0 -2.23753e-16 6.91749e-18 -1.31485e-16
711713
-1.0 1.0 -2.42066e-16 9.93754e-17
712714
-1.0 4.94472e-17 1.0 9.93754e-17
713-
-1.0 1.04958e-16 -1.31044e-16 1.0
715+
-1.0 1.04958e-16 -1.31044e-16 1.0
714716
```
715717
716718
Finally, the hypothesis matrix for a constructed `ContrastsMatrix` (as stored by

src/deprecated.jl

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
@deprecate(termnames(C::AbstractContrasts, levels::AbstractVector, baseind::Integer),
2+
coefnames(C::AbstractContrasts, levels::AbstractVector, baseind::Integer),
3+
false)
4+
5+
function Base.getproperty(cm::ContrastsMatrix, x::Symbol)
6+
if x === :termnames
7+
Base.depwarn("The `termnames` field of `ConstrastsMatrix` is deprecated; use `coefnames(cm)` instead.",
8+
:ContrastsMatrix)
9+
return coefnames(cm)
10+
else
11+
return getfield(cm, x)
12+
end
13+
end

0 commit comments

Comments
 (0)