DOC fixes for LogisticRegression newton-cholesky and multiclass (scikit-learn#31410)

lorentzenchr · ogrisel · web-flow · commit c219a6b3c1a7 · 2025-05-23T14:01:41.000-04:00
Co-authored-by: Olivier Grisel &lt;olivier.grisel@ensta.org&gt;
diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst
@@ -1022,7 +1022,7 @@ The following table summarizes the penalties and multinomial multiclass supporte
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
 | **Penalties**                | **'lbfgs'** | **'liblinear'** | **'newton-cg'** | **'newton-cholesky'** | **'sag'** | **'saga'** |
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
-| L2 penalty                   |     yes     |       no        |       yes       |     no                |    yes    |    yes     |
+| L2 penalty                   |     yes     |       yes       |       yes       |     yes               |    yes    |    yes     |
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
 | L1 penalty                   |     no      |       yes       |       no        |     no                |    no     |    yes     |
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
@@ -1032,7 +1032,7 @@ The following table summarizes the penalties and multinomial multiclass supporte
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
 | **Multiclass support**       |                                                                                                  |
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
-| multinomial multiclass       |     yes     |       no        |       yes       |     no                |    yes    |    yes     |
+| multinomial multiclass       |     yes     |       no        |       yes       |     yes               |    yes    |    yes     |
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
 | **Behaviors**                |                                                                                                  |
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
@@ -1043,8 +1043,11 @@ The following table summarizes the penalties and multinomial multiclass supporte
 | Robust to unscaled datasets  |     yes     |       yes       |       yes       |     yes               |    no     |    no      |
 +------------------------------+-------------+-----------------+-----------------+-----------------------+-----------+------------+
 
-The "lbfgs" solver is used by default for its robustness. For large datasets
-the "saga" solver is usually faster.
+The "lbfgs" solver is used by default for its robustness. For
+`n_samples >> n_features`, "newton-cholesky" is a good choice and can reach high
+precision (tiny `tol` values). For large datasets
+the "saga" solver is usually faster (than "lbfgs"), in particular for low precision
+(high `tol`).
 For large dataset, you may also consider using :class:`SGDClassifier`
 with `loss="log_loss"`, which might be even faster but requires more tuning.
 
@@ -1101,13 +1104,12 @@ zero, is likely to be an underfit, bad model and you are advised to set
     scaled datasets and on datasets with one-hot encoded categorical features with rare
     categories.
 
-  * The "newton-cholesky" solver is an exact Newton solver that calculates the hessian
+  * The "newton-cholesky" solver is an exact Newton solver that calculates the Hessian
     matrix and solves the resulting linear system. It is a very good choice for
-    `n_samples` >> `n_features`, but has a few shortcomings: Only :math:`\ell_2`
-    regularization is supported. Furthermore, because the hessian matrix is explicitly
-    computed, the memory usage has a quadratic dependency on `n_features` as well as on
-    `n_classes`. As a consequence, only the one-vs-rest scheme is implemented for the
-    multiclass case.
+    `n_samples` >> `n_features` and can reach high precision (tiny values of `tol`),
+    but has a few shortcomings: Only :math:`\ell_2` regularization is supported.
+    Furthermore, because the Hessian matrix is explicitly computed, the memory usage
+    has a quadratic dependency on `n_features` as well as on `n_classes`.
 
   For a comparison of some of these solvers, see [9]_.
 
diff --git a/sklearn/linear_model/_logistic.py b/sklearn/linear_model/_logistic.py
@@ -337,7 +337,7 @@ def _logistic_regression_path(
 
     else:
         if solver in ["sag", "saga", "lbfgs", "newton-cg", "newton-cholesky"]:
-            # SAG, lbfgs, newton-cg and newton-cg multinomial solvers need
+            # SAG, lbfgs, newton-cg and newton-cholesky multinomial solvers need
             # LabelEncoder, not LabelBinarizer, i.e. y as a 1d-array of integers.
             # LabelEncoder also saves memory compared to LabelBinarizer, especially
             # when n_classes is large.
@@ -837,9 +837,9 @@ class LogisticRegression(LinearClassifierMixin, SparseCoefMixin, BaseEstimator):
     the L2 penalty. The Elastic-Net regularization is only supported by the
     'saga' solver.
 
-    For :term:`multiclass` problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'
-    handle multinomial loss. 'liblinear' and 'newton-cholesky' only handle binary
-    classification but can be extended to handle multiclass by using
+    For :term:`multiclass` problems, all solvers but 'liblinear' optimize the
+    (penalized) multinomial loss. 'liblinear' only handle binary classification but can
+    be extended to handle multiclass by using
     :class:`~sklearn.multiclass.OneVsRestClassifier`.
 
     Read more in the :ref:`User Guide <logistic_regression>`.
@@ -957,13 +957,14 @@ class LogisticRegression(LinearClassifierMixin, SparseCoefMixin, BaseEstimator):
            summarizing solver/penalty supports.
 
         .. versionadded:: 0.17
-           Stochastic Average Gradient descent solver.
+           Stochastic Average Gradient (SAG) descent solver. Multinomial support in
+           version 0.18.
         .. versionadded:: 0.19
            SAGA solver.
         .. versionchanged:: 0.22
-            The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
+           The default solver changed from 'liblinear' to 'lbfgs' in 0.22.
         .. versionadded:: 1.2
-           newton-cholesky solver.
+           newton-cholesky solver. Multinomial support in version 1.6.
 
     max_iter : int, default=100
         Maximum number of iterations taken for the solvers to converge.
@@ -1597,11 +1598,12 @@ class LogisticRegressionCV(LogisticRegression, LinearClassifierMixin, BaseEstima
            a scaler from :mod:`sklearn.preprocessing`.
 
         .. versionadded:: 0.17
-           Stochastic Average Gradient descent solver.
+           Stochastic Average Gradient (SAG) descent solver. Multinomial support in
+           version 0.18.
         .. versionadded:: 0.19
            SAGA solver.
         .. versionadded:: 1.2
-           newton-cholesky solver.
+           newton-cholesky solver. Multinomial support in version 1.6.
 
     tol : float, default=1e-4
         Tolerance for stopping criteria.