Skip to content

Max margin interval trees

Toby Dylan Hocking edited this page Jul 4, 2017 · 21 revisions

Background

There are few R packages available for interval regression, a machine learning problem which is important in genomics and medicine. Like usual regression, the goal is to learn a function that inputs a feature vector and outputs a real-valued prediction. Unlike usual regression, each output in the training set is an interval of acceptable values (rather than one value). In the terminology of the survival analysis literature, this is regression with “left, right, and interval censored” output/response data.

Max margin interval trees is a new nonlinear model for this problem (TODO: cite paper when published). A dynamic programming algorithm is used to find the optimal split point for each feature. The dynamic programming algorithm has been implemented in C++ and there are wrappers to this solver in R and Python (https://github.com/aldro61/mmit). The Python package includes a decision tree learner. However there is not yet an implementation of the decision tree learner in the R package. The goal of this project is to write an R package that implements the decision tree learner in R.

Related work

The transformation forest model of Hothorn and Zeileis implements a decision tree model which can be trained on censored outputs (https://arxiv.org/abs/1701.02110). The trtf package on R-Forge implements this nonlinear model.

There are several linear models which can be trained on censored outputs.

Details of your coding project

TODO: can we implement this in the framework of partykit?

Expected impact

This project will provide an R implementation of the max margin interval tree model for interval regression, which currently only has a Python implementation.

Mentors

  • Alexandre Drouin <alexandre.drouin.8@ulaval.ca> is a co-author of the Max Margin Interval Trees paper, and author of the Python mmit module and C++ code.
  • Torsten Hothorn <Torsten.Hothorn@r-project.org> is an expert at implementing decision tree algos in R – he is the author of the trtf/partykit packages.
  • Backup mentor: Toby Hocking <toby.hocking@r-project.org> is a co-author of the Max Margin Interval Trees paper, author of the R package penaltyLearning, and mentor of the students that implemented the iregnet package (GSOC2016-2017).

Tests

partykit, iregnet, IntervalRegressionCV ?

  • Easy: run some R code that shows you know how to train and test a decision tree model (rpart, partykit, etc). Bonus points if you can get trtf running for an interval regression problem.
  • Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
  • Hard: Can the student write a package with Rd files, tests, and vigettes? If your package interfaces with non-R code, can the student write in that other language?

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally