Divergence from GLMnet when using a matrix with many variables

When fitting models with a large number of variables, Lasso.jl and GLMnet return different paths, and the difference grows bigger as the number of variables is bigger.

An example to illustrate this:
```julia
using Lasso, GLMNet, Statistics

# fits identical models in Lasso and GLMNet from mock data
# and returns the mean absolute difference of the betas of both models
function lasso_glmnet_dif(nrow, ncol, n_col_contributing)
    data = rand(nrow, ncol)
    outcome = mean(data[:, 1:n_col_contributing], dims = 1)[:,1] .> rand(nrow)
    presence_matrix = [1 .- outcome outcome]

    l = Lasso.fit(LassoPath, data, outcome, Binomial())
    g = GLMNet.glmnet(data, presence_matrix, Binomial())

    lcoefs = Vector(l.coefs[:,end])
    gcoefs = g.betas[:, end]

    mean(abs, lcoefs .- gcoefs)
end

# 1000 records, 5 variables that all contribute to outcome
lasso_glmnet_dif(1000, 5, 5) # order of magnitude 1e-9
# 1000 records, 100 variables of which 5 contribute to the outcome
lasso_glmnet_dif(1000, 1000, 5) # around 0.05
```

The context for this problem is that I'm working on a julia implementation of [maxnet](https://github.com/tiemvanderdeure/maxnet.jl), where a big-ish model matrix is generated (100s of columns) and a lasso path is used to select the most important ones.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Divergence from GLMnet when using a matrix with many variables #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Divergence from GLMnet when using a matrix with many variables #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions