ordered
is a parsnip extension to
enable additional classification models for ordinal outcomes (e.g.,
“low”, “medium”, “high”). While there are several model/engine
combinations in the parsnip package that can be used, this package adds:
- ordinal regression via
MASS::polr()
- ordinal classification trees via
rpartScore::rpartScore()
(Galimberti, Soffritti, and Di Maso, 2012) - ordinal forests via
ordinalForest::ordfor()
(Hornung, 2020)
More will be added. Under consideration are:
- ordinal regression via
VGAM::vglm()
(Yee, 2015) - ordinal regression via
ordinalNet::ordinalNet()
(Wurm, Hanlon, and Rathouz, 2021) - Bayesian ordinal regression via
ordinalbayes::ordinalbayes()
(Zhang and Archer, 2021) - ordinal regression via
ordinal::clm()
(Christensen, 2023)
There are some existing features in tidymodels packages that are useful for ordinal outcomes:
- The partykit engines
for
parsnip::decision_tree()
andparsnip::rand_forest()
use the ordered nature of the factors to train the model. - The yardstick package has
yardstick::kap()
for weighted and unweighted Kappa statistics (the former being of more interest). Also,yardstick::classification_cost()
can utilize more complex cost structures and uses the class probabilities for estimation.
You can install the development version of ordered like so:
# install.packages("pak")
pak::pak("corybrunson/ordered", dependencies = FALSE)
Currently, ordered relies on engine and dial registration in the following forks:
pak::pak("corybrunson/parsnip@ordered", dependencies = FALSE)
pak::pak("corybrunson/dials@ordered", dependencies = FALSE)
Here is a simple example using computational chemistry data to predict the permeability of a molecule:
library(ordered)
#> Loading required package: parsnip
#> Warning: S3 method 'tunable.multinomial_reg' was declared in NAMESPACE but not
#> found
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data(caco, package = "QSARdata")
caco_dat <-
inner_join(caco_Outcome, caco_Dragon, by = "Molecule") %>%
as_tibble() %>%
select(class = Class, mol_weight = QikProp_mol_MW,
volume = QikProp_volume, ClogP)
caco_train <- caco_dat[-(1:5), ]
caco_test <- caco_dat[ (1:5), ]
ord_rf_spec <-
rand_forest(mtry = 2, trees = 100) %>% # you should really use many more trees
set_mode("classification") %>%
set_engine("ordinalForest")
set.seed(382)
ord_rf_fit <- ord_rf_spec %>% fit(class ~ ., data = caco_train)
augment(ord_rf_fit, caco_test)
#> # A tibble: 5 × 8
#> .pred_class .pred_L .pred_M .pred_H class mol_weight volume ClogP
#> <ord> <dbl> <dbl> <dbl> <ord> <dbl> <dbl> <dbl>
#> 1 M 0.346 0.404 0.250 M 123. 445. 0.799
#> 2 M 0.362 0.514 0.124 L 290. 856. 0.534
#> 3 M 0.204 0.786 0.01 M 519. 1576. 1.02
#> 4 M 0.190 0.788 0.0221 M 533. 1606. 1.58
#> 5 M 0.177 0.780 0.0435 M 505. 1517. 1.71
Please note that the ordered project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.