Skip to content

SAGA sparse linear models

Toby Dylan Hocking edited this page Jan 27, 2018 · 16 revisions

Background

R has excellent support for regularized linear models, which are important machine learning models. For example, the glmnet package implements a coordinate descent algorithm for elastic net regularized generalized linear models. However there is not yet an implementation of the fast new SAGA algorithm in R. The goal of this project is to provide an R package implementing SAGA for solving L1-regularized generalized linear models.

Related work

Scikit-learn in python has an implementation of the SAGA algorithm in the LogisticRegression module, and we could probably copy/modify their code.

The SAG GSOC’15 project implemented a similar algorithm (SAG) for solving L2-regularized problems.

Details of your coding project

  • Port the SAGA code from Python to an R package.
  • vignette with speed/optimality comparisons with glmnet.
  • Put it on CRAN.

Expected impact

Sparse generalized linear models are widely used in the Stats/ML/R community, and this package would provide a fast new algo for computing them.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

  • Toby Hocking <toby.hocking@r-project.org> is a machine learning researcher who mentored the previous SAG R-GSOC student.
  • NEED ANOTHER MENTOR.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: use glmnet to compute a L1-regularized linear model of the spam data in library(ElemStatLearn). What features are selected for the prediction function?
  • Medium: compute timings of glmnet (with alpha=0) and bigoptim for data sets (feature matrices) with varying number of rows and columns. Plot timings versus number of rows and columns – which is faster? Which is more accurate, in terms of objective function values? (smaller is better)
  • Hard: write an R package with a simple C++ function.

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally