-
Notifications
You must be signed in to change notification settings - Fork 13
Max margin interval trees
There are few R packages available for interval regression, a machine learning problem which is important in genomics and medicine. Like usual regression, the goal is to learn a function that inputs a feature vector and outputs a real-valued prediction. Unlike usual regression, each output in the training set is an interval of acceptable values (rather than one value). In the terminology of the survival analysis literature, this is regression with “left, right, and interval censored” output/response data.
Max margin interval trees is a new nonlinear model for this problem (TODO: cite paper when published). A dynamic programming algorithm is used to find the optimal split point for each feature. The dynamic programming algorithm has been implemented in C++ and there are wrappers to this solver in R and Python (https://github.com/aldro61/mmit). The Python package includes a decision tree learner. However there is not yet an implementation of the decision tree learner in the R package. The goal of this project is to write an R package that implements the decision tree learner in R.
The transformation forest model of Hothorn and Zeileis implements a decision tree model which can be trained on censored outputs (https://arxiv.org/abs/1701.02110). The trtf package on R-Forge implements this nonlinear model.
There are several linear models which can be trained on censored outputs.
- iregnet implements elastic net regularized Accelerated Failure Time models.
- penaltyLearning::IntervalRegressionCV implements a solver for the squared hinge loss and L1 regularization.
TODO: can we implement this in the framework of partykit?
This project will provide an R implementation of the max margin interval tree model for interval regression, which currently only has a Python implementation.
- Toby Hocking <toby.hocking@r-project.org> is a co-author of the Max Margin Interval Trees paper, author of the R package penaltyLearning, and mentor of the students that implemented the iregnet package (GSOC2016-2017).
- TODO Other Dev <other.dev@gmail.com> is an expert at implementing decision tree algos in R.
partykit, iregnet, IntervalRegressionCV ?
- Easy: …
- Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
- Hard: Can the student write a package with Rd files, tests, and vigettes? If your package interfaces with non-R code, can the student write in that other language?
Students, please post a link to your test results here.