-
Notifications
You must be signed in to change notification settings - Fork 13
SAGA sparse linear models
R has excellent support for regularized linear models, which are important machine learning models. For example, the glmnet package implements a coordinate descent algorithm for elastic net regularized generalized linear models. However there is not yet an implementation of the fast new SAGA algorithm in R. The goal of this project is to provide an R package implementing SAGA for solving L1-regularized generalized linear models.
Scikit-learn in python has an implementation of the SAGA algorithm in the LogisticRegression module, and we could probably copy/modify their code.
The SAG GSOC’15 project implemented a similar algorithm (SAG) for solving L2-regularized problems.
- Port the SAGA code from Python to an R package.
- vignette with speed/optimality comparisons with glmnet.
- Put it on CRAN.
Sparse generalized linear models are widely used in the Stats/ML/R community, and this package would provide a fast new algo for computing them.
Students, please contact mentors below after completing at least one of the tests below.
- Toby Hocking <toby.hocking@r-project.org> is a machine learning researcher who mentored the previous SAG R-GSOC student.
- NEED ANOTHER MENTOR: any interested students should email the scikit-learn developers to find out who is familiar with their SAGA implementation, and then ask that person if he/she can mentor.
Students, please do one or more of the following tests before contacting the mentors above.
- Easy: use glmnet to compute a L1-regularized linear model of the spam data in library(ElemStatLearn). What features are selected for the prediction function?
- Medium: compute timings of glmnet (with alpha=0) and bigoptim for data sets (feature matrices) with varying number of rows and columns. Plot timings versus number of rows and columns – which is faster? Which is more accurate, in terms of objective function values? (smaller is better)
- Hard: write an R package with a simple C++ function.
Students, please post a link to your test results here.