Skip to content

Commit c219a6b

Browse files
DOC fixes for LogisticRegression newton-cholesky and multiclass (scikit-learn#31410)
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
1 parent a2ceff3 commit c219a6b

File tree

2 files changed

+23
-19
lines changed

2 files changed

+23
-19
lines changed

doc/modules/linear_model.rst

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1022,7 +1022,7 @@ The following table summarizes the penalties and multinomial multiclass supporte
10221022
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
10231023
| **Penalties** | **'lbfgs'** | **'liblinear'** | **'newton-cg'** | **'newton-cholesky'** | **'sag'** | **'saga'** |
10241024
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
1025-
| L2 penalty | yes | no | yes | no | yes | yes |
1025+
| L2 penalty | yes | yes | yes | yes | yes | yes |
10261026
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
10271027
| L1 penalty | no | yes | no | no | no | yes |
10281028
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
@@ -1032,7 +1032,7 @@ The following table summarizes the penalties and multinomial multiclass supporte
10321032
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
10331033
| **Multiclass support** | |
10341034
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
1035-
| multinomial multiclass | yes | no | yes | no | yes | yes |
1035+
| multinomial multiclass | yes | no | yes | yes | yes | yes |
10361036
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
10371037
| **Behaviors** | |
10381038
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
@@ -1043,8 +1043,11 @@ The following table summarizes the penalties and multinomial multiclass supporte
10431043
| Robust to unscaled datasets | yes | yes | yes | yes | no | no |
10441044
+------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
10451045

1046-
The "lbfgs" solver is used by default for its robustness. For large datasets
1047-
the "saga" solver is usually faster.
1046+
The "lbfgs" solver is used by default for its robustness. For
1047+
`n_samples >> n_features`, "newton-cholesky" is a good choice and can reach high
1048+
precision (tiny `tol` values). For large datasets
1049+
the "saga" solver is usually faster (than "lbfgs"), in particular for low precision
1050+
(high `tol`).
10481051
For large dataset, you may also consider using :class:`SGDClassifier`
10491052
with `loss="log_loss"`, which might be even faster but requires more tuning.
10501053

@@ -1101,13 +1104,12 @@ zero, is likely to be an underfit, bad model and you are advised to set
11011104
scaled datasets and on datasets with one-hot encoded categorical features with rare
11021105
categories.
11031106

1104-
* The "newton-cholesky" solver is an exact Newton solver that calculates the hessian
1107+
* The "newton-cholesky" solver is an exact Newton solver that calculates the Hessian
11051108
matrix and solves the resulting linear system. It is a very good choice for
1106-
`n_samples` >> `n_features`, but has a few shortcomings: Only :math:`\ell_2`
1107-
regularization is supported. Furthermore, because the hessian matrix is explicitly
1108-
computed, the memory usage has a quadratic dependency on `n_features` as well as on
1109-
`n_classes`. As a consequence, only the one-vs-rest scheme is implemented for the
1110-
multiclass case.
1109+
`n_samples` >> `n_features` and can reach high precision (tiny values of `tol`),
1110+
but has a few shortcomings: Only :math:`\ell_2` regularization is supported.
1111+
Furthermore, because the Hessian matrix is explicitly computed, the memory usage
1112+
has a quadratic dependency on `n_features` as well as on `n_classes`.
11111113

11121114
For a comparison of some of these solvers, see [9]_.
11131115

sklearn/linear_model/_logistic.py

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ def _logistic_regression_path(
337337

338338
else:
339339
if solver in ["sag", "saga", "lbfgs", "newton-cg", "newton-cholesky"]:
340-
# SAG, lbfgs, newton-cg and newton-cg multinomial solvers need
340+
# SAG, lbfgs, newton-cg and newton-cholesky multinomial solvers need
341341
# LabelEncoder, not LabelBinarizer, i.e. y as a 1d-array of integers.
342342
# LabelEncoder also saves memory compared to LabelBinarizer, especially
343343
# when n_classes is large.
@@ -837,9 +837,9 @@ class LogisticRegression(LinearClassifierMixin, SparseCoefMixin, BaseEstimator):
837837
the L2 penalty. The Elastic-Net regularization is only supported by the
838838
'saga' solver.
839839
840-
For :term:`multiclass` problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'
841-
handle multinomial loss. 'liblinear' and 'newton-cholesky' only handle binary
842-
classification but can be extended to handle multiclass by using
840+
For :term:`multiclass` problems, all solvers but 'liblinear' optimize the
841+
(penalized) multinomial loss. 'liblinear' only handle binary classification but can
842+
be extended to handle multiclass by using
843843
:class:`~sklearn.multiclass.OneVsRestClassifier`.
844844
845845
Read more in the :ref:`User Guide <logistic_regression>`.
@@ -957,13 +957,14 @@ class LogisticRegression(LinearClassifierMixin, SparseCoefMixin, BaseEstimator):
957957
summarizing solver/penalty supports.
958958
959959
.. versionadded:: 0.17
960-
Stochastic Average Gradient descent solver.
960+
Stochastic Average Gradient (SAG) descent solver. Multinomial support in
961+
version 0.18.
961962
.. versionadded:: 0.19
962963
SAGA solver.
963964
.. versionchanged:: 0.22
964-
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
965+
The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
965966
.. versionadded:: 1.2
966-
newton-cholesky solver.
967+
newton-cholesky solver. Multinomial support in version 1.6.
967968
968969
max_iter : int, default=100
969970
Maximum number of iterations taken for the solvers to converge.
@@ -1597,11 +1598,12 @@ class LogisticRegressionCV(LogisticRegression, LinearClassifierMixin, BaseEstima
15971598
a scaler from :mod:`sklearn.preprocessing`.
15981599
15991600
.. versionadded:: 0.17
1600-
Stochastic Average Gradient descent solver.
1601+
Stochastic Average Gradient (SAG) descent solver. Multinomial support in
1602+
version 0.18.
16011603
.. versionadded:: 0.19
16021604
SAGA solver.
16031605
.. versionadded:: 1.2
1604-
newton-cholesky solver.
1606+
newton-cholesky solver. Multinomial support in version 1.6.
16051607
16061608
tol : float, default=1e-4
16071609
Tolerance for stopping criteria.

0 commit comments

Comments
 (0)