Skip to content

Commit 6b57c58

Browse files
authored
Bring in monotonicity (#47)
<!-- Thanks for contributing a pull request! Please ensure you have taken a look at the contribution guidelines: https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md --> #### Reference Issues/PRs <!-- Example: Fixes scikit-learn#1234. See also scikit-learn#3456. Please use keywords (e.g., Fixes) to create link to the issues or pull requests you resolved, so that they will automatically be closed when your pull request is merged. See https://github.com/blog/1506-closing-issues-via-pull-requests --> #### What does this implement/fix? Explain your changes. #### Any other comments? <!-- Please be aware that we are a loose team of volunteers so patience is necessary; assistance handling other issues is very welcome. We value all user contributions, no matter how minor they are. If we are slow to review, either the pull request needs some benchmarking, tinkering, convincing, etc. or more likely the reviewers are simply busy. In either case, we ask for your understanding during the review process. For more information, see our FAQ on this topic: http://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention. Thanks for contributing! --> Signed-off-by: Adam Li <adam2392@gmail.com>
1 parent 9655d01 commit 6b57c58

File tree

15 files changed

+1231
-59
lines changed

15 files changed

+1231
-59
lines changed

doc/whats_new/v1.4.rst

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,31 @@ TODO: update at the time of the release.
5959
passed to the ``fit`` method of the the estimator. :pr:`26506` by `Adrin
6060
Jalali`_.
6161

62+
63+
:mod:`sklearn.ensemble`
64+
.......................
65+
66+
- |Feature| :class:`ensemble.RandomForestClassifier`,
67+
:class:`ensemble.RandomForestRegressor`, :class:`ensemble.ExtraTreesClassifier`
68+
and :class:`ensemble.ExtraTreesRegressor` now support monotonic constraints,
69+
useful when features are supposed to have a positive/negative effect on the target.
70+
Missing values in the train data and multi-output targets are not supported.
71+
:pr:`13649` by :user:`Samuel Ronsin <samronsin>`,
72+
initiated by :user:`Patrick O'Reilly <pat-oreilly>`.
73+
74+
75+
:mod:`sklearn.tree`
76+
...................
77+
78+
- |Feature| :class:`tree.DecisionTreeClassifier`, :class:`tree.DecisionTreeRegressor`,
79+
:class:`tree.ExtraTreeClassifier` and :class:`tree.ExtraTreeRegressor` now support
80+
monotonic constraints, useful when features are supposed to have a positive/negative
81+
effect on the target. Missing values in the train data and multi-output targets are
82+
not supported.
83+
:pr:`13649` by :user:`Samuel Ronsin <samronsin>`, initiated by
84+
:user:`Patrick O'Reilly <pat-oreilly>`.
85+
86+
6287
:mod:`sklearn.decomposition`
6388
............................
6489

@@ -68,3 +93,10 @@ TODO: update at the time of the release.
6893
when using a custom initialization. The default value of this parameter will change
6994
from `None` to `auto` in version 1.6.
7095
:pr:`26634` by :user:`Alexandre Landeau <AlexL>` and :user:`Alexandre Vigny <avigny>`.
96+
97+
98+
:mod:`sklearn.feature_selection`
99+
................................
100+
101+
- |Fix| :func:`feature_selection.mutual_info_regression` now correctly computes the
102+
result when `X` is of integer dtype. :pr:`26748` by :user:`Yao Xiao <Charlie-XIAO>`.

sklearn/ensemble/_forest.py

Lines changed: 96 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1525,9 +1525,31 @@ class RandomForestClassifier(ForestClassifier):
15251525
15261526
.. versionadded:: 0.22
15271527
1528+
max_bins : int, default=255
1529+
The maximum number of bins to use for non-missing values.
1530+
15281531
store_leaf_values : bool, default=False
15291532
Whether to store the leaf values in the ``get_leaf_node_samples`` function.
15301533
1534+
monotonic_cst : array-like of int of shape (n_features), default=None
1535+
Indicates the monotonicity constraint to enforce on each feature.
1536+
- 1: monotonic increase
1537+
- 0: no constraint
1538+
- -1: monotonic decrease
1539+
1540+
If monotonic_cst is None, no constraints are applied.
1541+
1542+
Monotonicity constraints are not supported for:
1543+
- multiclass classifications (i.e. when `n_classes > 2`),
1544+
- multioutput classifications (i.e. when `n_outputs_ > 1`),
1545+
- classifications trained on data with missing values.
1546+
1547+
The constraints hold over the probability of the positive class.
1548+
1549+
Read more in the :ref:`User Guide <monotonic_cst_gbdt>`.
1550+
1551+
.. versionadded:: 1.4
1552+
15311553
Attributes
15321554
----------
15331555
estimator_ : :class:`~sklearn.tree.DecisionTreeClassifier`
@@ -1670,6 +1692,7 @@ def __init__(
16701692
max_samples=None,
16711693
max_bins=None,
16721694
store_leaf_values=False,
1695+
monotonic_cst=None,
16731696
):
16741697
super().__init__(
16751698
estimator=DecisionTreeClassifier(),
@@ -1686,6 +1709,7 @@ def __init__(
16861709
"random_state",
16871710
"ccp_alpha",
16881711
"store_leaf_values",
1712+
"monotonic_cst",
16891713
),
16901714
bootstrap=bootstrap,
16911715
oob_score=oob_score,
@@ -1707,6 +1731,7 @@ def __init__(
17071731
self.max_features = max_features
17081732
self.max_leaf_nodes = max_leaf_nodes
17091733
self.min_impurity_decrease = min_impurity_decrease
1734+
self.monotonic_cst = monotonic_cst
17101735
self.ccp_alpha = ccp_alpha
17111736

17121737

@@ -1887,9 +1912,29 @@ class RandomForestRegressor(ForestRegressor):
18871912
18881913
.. versionadded:: 0.22
18891914
1915+
max_bins : int, default=255
1916+
The maximum number of bins to use for non-missing values. Used for
1917+
speeding up training time.
1918+
18901919
store_leaf_values : bool, default=False
18911920
Whether to store the leaf values in the ``get_leaf_node_samples`` function.
18921921
1922+
monotonic_cst : array-like of int of shape (n_features), default=None
1923+
Indicates the monotonicity constraint to enforce on each feature.
1924+
- 1: monotonically increasing
1925+
- 0: no constraint
1926+
- -1: monotonically decreasing
1927+
1928+
If monotonic_cst is None, no constraints are applied.
1929+
1930+
Monotonicity constraints are not supported for:
1931+
- multioutput regressions (i.e. when `n_outputs_ > 1`),
1932+
- regressions trained on data with missing values.
1933+
1934+
Read more in the :ref:`User Guide <monotonic_cst_gbdt>`.
1935+
1936+
.. versionadded:: 1.4
1937+
18931938
Attributes
18941939
----------
18951940
estimator_ : :class:`~sklearn.tree.DecisionTreeRegressor`
@@ -2019,6 +2064,7 @@ def __init__(
20192064
max_samples=None,
20202065
max_bins=None,
20212066
store_leaf_values=False,
2067+
monotonic_cst=None,
20222068
):
20232069
super().__init__(
20242070
estimator=DecisionTreeRegressor(),
@@ -2035,6 +2081,7 @@ def __init__(
20352081
"random_state",
20362082
"ccp_alpha",
20372083
"store_leaf_values",
2084+
"monotonic_cst",
20382085
),
20392086
bootstrap=bootstrap,
20402087
oob_score=oob_score,
@@ -2056,6 +2103,7 @@ def __init__(
20562103
self.max_leaf_nodes = max_leaf_nodes
20572104
self.min_impurity_decrease = min_impurity_decrease
20582105
self.ccp_alpha = ccp_alpha
2106+
self.monotonic_cst = monotonic_cst
20592107

20602108

20612109
class ExtraTreesClassifier(ForestClassifier):
@@ -2242,10 +2290,32 @@ class ExtraTreesClassifier(ForestClassifier):
22422290
`max_samples` should be in the interval `(0.0, 1.0]`.
22432291
22442292
.. versionadded:: 0.22
2293+
2294+
max_bins : int, default=255
2295+
The maximum number of bins to use for non-missing values.
22452296
22462297
store_leaf_values : bool, default=False
22472298
Whether to store the leaf values in the ``get_leaf_node_samples`` function.
22482299
2300+
monotonic_cst : array-like of int of shape (n_features), default=None
2301+
Indicates the monotonicity constraint to enforce on each feature.
2302+
- 1: monotonically increasing
2303+
- 0: no constraint
2304+
- -1: monotonically decreasing
2305+
2306+
If monotonic_cst is None, no constraints are applied.
2307+
2308+
Monotonicity constraints are not supported for:
2309+
- multiclass classifications (i.e. when `n_classes > 2`),
2310+
- multioutput classifications (i.e. when `n_outputs_ > 1`),
2311+
- classifications trained on data with missing values.
2312+
2313+
The constraints hold over the probability of the positive class.
2314+
2315+
Read more in the :ref:`User Guide <monotonic_cst_gbdt>`.
2316+
2317+
.. versionadded:: 1.4
2318+
22492319
Attributes
22502320
----------
22512321
estimator_ : :class:`~sklearn.tree.ExtraTreesClassifier`
@@ -2377,6 +2447,7 @@ def __init__(
23772447
max_samples=None,
23782448
max_bins=None,
23792449
store_leaf_values=False,
2450+
monotonic_cst=None,
23802451
):
23812452
super().__init__(
23822453
estimator=ExtraTreeClassifier(),
@@ -2393,6 +2464,7 @@ def __init__(
23932464
"random_state",
23942465
"ccp_alpha",
23952466
"store_leaf_values",
2467+
"monotonic_cst",
23962468
),
23972469
bootstrap=bootstrap,
23982470
oob_score=oob_score,
@@ -2415,6 +2487,7 @@ def __init__(
24152487
self.max_leaf_nodes = max_leaf_nodes
24162488
self.min_impurity_decrease = min_impurity_decrease
24172489
self.ccp_alpha = ccp_alpha
2490+
self.monotonic_cst = monotonic_cst
24182491

24192492

24202493
class ExtraTreesRegressor(ForestRegressor):
@@ -2590,9 +2663,28 @@ class ExtraTreesRegressor(ForestRegressor):
25902663
25912664
.. versionadded:: 0.22
25922665
2666+
max_bins : int, default=255
2667+
The maximum number of bins to use for non-missing values.
2668+
25932669
store_leaf_values : bool, default=False
25942670
Whether to store the leaf values in the ``get_leaf_node_samples`` function.
25952671
2672+
monotonic_cst : array-like of int of shape (n_features), default=None
2673+
Indicates the monotonicity constraint to enforce on each feature.
2674+
- 1: monotonically increasing
2675+
- 0: no constraint
2676+
- -1: monotonically decreasing
2677+
2678+
If monotonic_cst is None, no constraints are applied.
2679+
2680+
Monotonicity constraints are not supported for:
2681+
- multioutput regressions (i.e. when `n_outputs_ > 1`),
2682+
- regressions trained on data with missing values.
2683+
2684+
Read more in the :ref:`User Guide <monotonic_cst_gbdt>`.
2685+
2686+
.. versionadded:: 1.4
2687+
25962688
Attributes
25972689
----------
25982690
estimator_ : :class:`~sklearn.tree.ExtraTreeRegressor`
@@ -2707,6 +2799,7 @@ def __init__(
27072799
max_samples=None,
27082800
max_bins=None,
27092801
store_leaf_values=False,
2802+
monotonic_cst=None,
27102803
):
27112804
super().__init__(
27122805
estimator=ExtraTreeRegressor(),
@@ -2723,6 +2816,7 @@ def __init__(
27232816
"random_state",
27242817
"ccp_alpha",
27252818
"store_leaf_values",
2819+
"monotonic_cst",
27262820
),
27272821
bootstrap=bootstrap,
27282822
oob_score=oob_score,
@@ -2744,6 +2838,7 @@ def __init__(
27442838
self.max_leaf_nodes = max_leaf_nodes
27452839
self.min_impurity_decrease = min_impurity_decrease
27462840
self.ccp_alpha = ccp_alpha
2841+
self.monotonic_cst = monotonic_cst
27472842

27482843

27492844
class RandomTreesEmbedding(TransformerMixin, BaseForest):
@@ -2937,7 +3032,7 @@ class RandomTreesEmbedding(TransformerMixin, BaseForest):
29373032
**BaseDecisionTree._parameter_constraints,
29383033
"sparse_output": ["boolean"],
29393034
}
2940-
for param in ("max_features", "ccp_alpha", "splitter"):
3035+
for param in ("max_features", "ccp_alpha", "splitter", "monotonic_cst"):
29413036
_parameter_constraints.pop(param)
29423037

29433038
criterion = "squared_error"

sklearn/ensemble/_gb.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ class BaseGradientBoosting(BaseEnsemble, metaclass=ABCMeta):
138138
}
139139
_parameter_constraints.pop("store_leaf_values")
140140
_parameter_constraints.pop("splitter")
141+
_parameter_constraints.pop("monotonic_cst")
141142

142143
@abstractmethod
143144
def __init__(

sklearn/feature_selection/_mutual_info.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -280,15 +280,12 @@ def _estimate_mi(
280280

281281
rng = check_random_state(random_state)
282282
if np.any(continuous_mask):
283-
if copy:
284-
X = X.copy()
285-
283+
X = X.astype(np.float64, copy=copy)
286284
X[:, continuous_mask] = scale(
287285
X[:, continuous_mask], with_mean=False, copy=False
288286
)
289287

290288
# Add small noise to continuous features as advised in Kraskov et. al.
291-
X = X.astype(np.float64, copy=False)
292289
means = np.maximum(1, np.mean(np.abs(X[:, continuous_mask]), axis=0))
293290
X[:, continuous_mask] += (
294291
1e-10

sklearn/feature_selection/tests/test_mutual_info.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,3 +236,18 @@ def test_mutual_information_symmetry_classif_regression(correlated, global_rando
236236
)
237237

238238
assert mi_classif == pytest.approx(mi_regression)
239+
240+
241+
def test_mutual_info_regression_X_int_dtype(global_random_seed):
242+
"""Check that results agree when X is integer dtype and float dtype.
243+
244+
Non-regression test for Issue #26696.
245+
"""
246+
rng = np.random.RandomState(global_random_seed)
247+
X = rng.randint(100, size=(100, 10))
248+
X_float = X.astype(np.float64, copy=True)
249+
y = rng.randint(100, size=100)
250+
251+
expected = mutual_info_regression(X_float, y, random_state=global_random_seed)
252+
result = mutual_info_regression(X, y, random_state=global_random_seed)
253+
assert_allclose(result, expected)

sklearn/pipeline.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -131,10 +131,11 @@ class Pipeline(_BaseComposition):
131131
>>> pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC())])
132132
>>> # The pipeline can be used as any other estimator
133133
>>> # and avoids leaking the test set into the train set
134-
>>> pipe.fit(X_train, y_train)
135-
Pipeline(steps=[('scaler', StandardScaler()), ('svc', SVC())])
136-
>>> pipe.score(X_test, y_test)
134+
>>> pipe.fit(X_train, y_train).score(X_test, y_test)
137135
0.88
136+
>>> # An estimator's parameter can be set using '__' syntax
137+
>>> pipe.set_params(svc__C=10).fit(X_train, y_train).score(X_test, y_test)
138+
0.76
138139
"""
139140

140141
# BaseEstimator interface
@@ -1051,6 +1052,10 @@ class FeatureUnion(TransformerMixin, _BaseComposition):
10511052
>>> union.fit_transform(X)
10521053
array([[ 1.5 , 3.0..., 0.8...],
10531054
[-1.5 , 5.7..., -0.4...]])
1055+
>>> # An estimator's parameter can be set using '__' syntax
1056+
>>> union.set_params(pca__n_components=1).fit_transform(X)
1057+
array([[ 1.5 , 3.0...],
1058+
[-1.5 , 5.7...]])
10541059
"""
10551060

10561061
_required_parameters = ["transformer_list"]
@@ -1362,11 +1367,12 @@ def __getitem__(self, name):
13621367

13631368

13641369
def make_union(*transformers, n_jobs=None, verbose=False):
1365-
"""Construct a FeatureUnion from the given transformers.
1370+
"""Construct a :class:`FeatureUnion` from the given transformers.
13661371
1367-
This is a shorthand for the FeatureUnion constructor; it does not require,
1368-
and does not permit, naming the transformers. Instead, they will be given
1369-
names automatically based on their types. It also does not allow weighting.
1372+
This is a shorthand for the :class:`FeatureUnion` constructor; it does not
1373+
require, and does not permit, naming the transformers. Instead, they will
1374+
be given names automatically based on their types. It also does not allow
1375+
weighting.
13701376
13711377
Parameters
13721378
----------

sklearn/preprocessing/_data.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -721,11 +721,12 @@ class StandardScaler(OneToOneFeatureMixin, TransformerMixin, BaseEstimator):
721721
722722
mean_ : ndarray of shape (n_features,) or None
723723
The mean value for each feature in the training set.
724-
Equal to ``None`` when ``with_mean=False``.
724+
Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.
725725
726726
var_ : ndarray of shape (n_features,) or None
727727
The variance for each feature in the training set. Used to compute
728-
`scale_`. Equal to ``None`` when ``with_std=False``.
728+
`scale_`. Equal to ``None`` when ``with_mean=False`` and
729+
``with_std=False``.
729730
730731
n_features_in_ : int
731732
Number of features seen during :term:`fit`.

0 commit comments

Comments
 (0)