Skip to content

Commit a6a6b0e

Browse files
authored
Merge branch 'scikit-learn:main' into submodulev2
2 parents 512f34c + 0a45f93 commit a6a6b0e

File tree

30 files changed

+631
-195
lines changed

30 files changed

+631
-195
lines changed

SECURITY.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
| Version | Supported |
66
| --------- | ------------------ |
7-
| 1.2.2 | :white_check_mark: |
8-
| < 1.2.2 | :x: |
7+
| 1.3.0 | :white_check_mark: |
8+
| < 1.3.0 | :x: |
99

1010
## Reporting a Vulnerability
1111

doc/conf.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,8 @@
6666
]
6767

6868
# Specify how to identify the prompt when copying code snippets
69-
copybutton_prompt_text = ">>> "
69+
copybutton_prompt_text = r">>> |\.\.\. "
70+
copybutton_prompt_is_regexp = True
7071

7172
try:
7273
import jupyterlite_sphinx # noqa: F401

doc/developers/contributing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -920,7 +920,7 @@ Building the documentation requires installing some additional packages:
920920

921921
pip install sphinx sphinx-gallery numpydoc matplotlib Pillow pandas \
922922
scikit-image packaging seaborn sphinx-prompt \
923-
sphinxext-opengraph plotly pooch
923+
sphinxext-opengraph sphinx-copybutton plotly pooch
924924

925925
To build the documentation, you need to be in the ``doc`` folder:
926926

doc/templates/index.html

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,8 @@ <h4 class="sk-landing-call-header">News</h4>
169169
<li><strong>On-going development:</strong>
170170
<a href="https://scikit-learn.org/dev/whats_new.html"><strong>What's new</strong> (Changelog)</a>
171171
</li>
172+
<li><strong>June 2023.</strong> scikit-learn 1.3.0 is available for download (<a href="whats_new/v1.3.html#version-1-3-0">Changelog</a>).
173+
</li>
172174
<li><strong>March 2023.</strong> scikit-learn 1.2.2 is available for download (<a href="whats_new/v1.2.html#version-1-2-2">Changelog</a>).
173175
</li>
174176
<li><strong>January 2023.</strong> scikit-learn 1.2.1 is available for download (<a href="whats_new/v1.2.html#version-1-2-1">Changelog</a>).

doc/themes/scikit-learn-modern/static/css/theme.css

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1019,13 +1019,12 @@ div.sphx-glr-thumbcontainer {
10191019
padding: 0;
10201020
}
10211021

1022-
10231022
@media screen and (min-width: 1540px) {
1024-
.sphx-glr-download-link-note {
1025-
position: absolute;
1023+
div.sphx-glr-download-link-note.admonition.note {
10261024
position: absolute;
10271025
left: 98%;
10281026
width: 20ex;
1027+
margin-top: calc(max(5.75rem, 1vh));
10291028
}
10301029
}
10311030

doc/whats_new/v1.3.rst

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@
77
Version 1.3.0
88
=============
99

10-
**In Development**
10+
**June 2023**
11+
12+
For a short description of the main highlights of the release, please refer to
13+
:ref:`sphx_glr_auto_examples_release_highlights_plot_release_highlights_1_3_0.py`.
1114

1215
.. include:: changelog_legend.inc
1316

@@ -770,4 +773,40 @@ Code and Documentation Contributors
770773
Thanks to everyone who has contributed to the maintenance and improvement of
771774
the project since version 1.2, including:
772775

773-
TODO: update at the time of the release.
776+
2357juan, Abhishek Singh Kushwah, Adam Handke, Adam Kania, Adam Li, adienes,
777+
Admir Demiraj, adoublet, Adrin Jalali, A.H.Mansouri, Ahmedbgh, Ala-Na, Alex
778+
Buzenet, AlexL, Ali H. El-Kassas, amay, András Simon, André Pedersen, Andrew
779+
Wang, Ankur Singh, annegnx, Ansam Zedan, Anthony22-dev, Artur Hermano, Arturo
780+
Amor, as-90, ashah002, Ashish Dutt, Ashwin Mathur, AymericBasset, Azaria
781+
Gebremichael, Barata Tripramudya Onggo, Benedek Harsanyi, Benjamin Bossan,
782+
Bharat Raghunathan, Binesh Bannerjee, Boris Feld, Brendan Lu, Brevin Kunde,
783+
cache-missing, Camille Troillard, Carla J, carlo, Carlo Lemos, c-git, Changyao
784+
Chen, Chiara Marmo, Christian Lorentzen, Christian Veenhuis, Christine P. Chai,
785+
crispinlogan, Da-Lan, DanGonite57, Dave Berenbaum, davidblnc, david-cortes,
786+
Dayne, Dea María Léon, Denis, Dimitri Papadopoulos Orfanos, Dimitris
787+
Litsidis, Dmitry Nesterov, Dominic Fox, Dominik Prodinger, Edern, Ekaterina
788+
Butyugina, Elabonga Atuo, Emir, farhan khan, Felipe Siola, futurewarning, Gael
789+
Varoquaux, genvalen, Gleb Levitski, Guillaume Lemaitre, gunesbayir, Haesun
790+
Park, hujiahong726, i-aki-y, Ian Thompson, Ido M, Ily, Irene, Jack McIvor,
791+
jakirkham, James Dean, JanFidor, Jarrod Millman, JB Mountford, Jérémie du
792+
Boisberranger, Jessicakk0711, Jiawei Zhang, Joey Ortiz, JohnathanPi, John
793+
Pangas, Joshua Choo Yun Keat, Joshua Hedlund, JuliaSchoepp, Julien Jerphanion,
794+
jygerardy, ka00ri, Kaushik Amar Das, Kento Nozawa, Kian Eliasi, Kilian Kluge,
795+
Lene Preuss, Linus, Logan Thomas, Loic Esteve, Louis Fouquet, Lucy Liu, Madhura
796+
Jayaratne, Marc Torrellas Socastro, Maren Westermann, Mario Kostelac, Mark
797+
Harfouche, Marko Toplak, Marvin Krawutschke, Masanori Kanazu, mathurinm, Matt
798+
Haberland, Max Halford, maximeSaur, Maxwell Liu, m. bou, mdarii, Meekail Zain,
799+
Mikhail Iljin, murezzda, Nawazish Alam, Nicola Fanelli, Nightwalkx, Nikolay
800+
Petrov, Nishu Choudhary, NNLNR, npache, Olivier Grisel, Omar Salman, ouss1508,
801+
PAB, Pandata, partev, Peter Piontek, Phil, pnucci, Pooja M, Pooja Subramaniam,
802+
precondition, Quentin Barthélemy, Rafal Wojdyla, Raghuveer Bhat, Rahil Parikh,
803+
Ralf Gommers, ram vikram singh, Rushil Desai, Sadra Barikbin, SANJAI_3, Sashka
804+
Warner, Scott Gigante, Scott Gustafson, searchforpassion, Seoeun
805+
Hong, Shady el Gewily, Shiva chauhan, Shogo Hida, Shreesha Kumar Bhat, sonnivs,
806+
Sortofamudkip, Stanislav (Stanley) Modrak, Stefanie Senger, Steven Van
807+
Vaerenbergh, Tabea Kossen, Théophile Baranger, Thijs van Weezel, Thomas A
808+
Caswell, Thomas Germer, Thomas J. Fan, Tim Head, Tim P, Tom Dupré la Tour,
809+
tomiock, tspeng, Valentin Laurent, Veghit, VIGNESH D, Vijeth Moudgalya, Vinayak
810+
Mehta, Vincent M, Vincent-violet, Vyom Pathak, William M, windiana42, Xiao
811+
Yuan, Yao Xiao, Yaroslav Halchenko, Yotam Avidar-Constantini, Yuchen Zhou,
812+
Yusuf Raji, zeeshan lone

examples/applications/plot_topics_extraction_with_nmf_lda.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,14 +46,13 @@ def plot_top_words(model, feature_names, n_top_words, title):
4646
fig, axes = plt.subplots(2, 5, figsize=(30, 15), sharex=True)
4747
axes = axes.flatten()
4848
for topic_idx, topic in enumerate(model.components_):
49-
top_features_ind = topic.argsort()[: -n_top_words - 1 : -1]
50-
top_features = [feature_names[i] for i in top_features_ind]
49+
top_features_ind = topic.argsort()[-n_top_words:]
50+
top_features = feature_names[top_features_ind]
5151
weights = topic[top_features_ind]
5252

5353
ax = axes[topic_idx]
5454
ax.barh(top_features, weights, height=0.7)
5555
ax.set_title(f"Topic {topic_idx +1}", fontdict={"fontsize": 30})
56-
ax.invert_yaxis()
5756
ax.tick_params(axis="both", which="major", labelsize=20)
5857
for i in "top right left".split():
5958
ax.spines[i].set_visible(False)
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# flake8: noqa
2+
"""
3+
=======================================
4+
Release Highlights for scikit-learn 1.3
5+
=======================================
6+
7+
.. currentmodule:: sklearn
8+
9+
We are pleased to announce the release of scikit-learn 1.3! Many bug fixes
10+
and improvements were added, as well as some new key features. We detail
11+
below a few of the major features of this release. **For an exhaustive list of
12+
all the changes**, please refer to the :ref:`release notes <changes_1_3>`.
13+
14+
To install the latest version (with pip)::
15+
16+
pip install --upgrade scikit-learn
17+
18+
or with conda::
19+
20+
conda install -c conda-forge scikit-learn
21+
22+
"""
23+
24+
# %%
25+
# Metadata Routing
26+
# ----------------
27+
# We are in the process of introducing a new way to route metadata such as
28+
# ``sample_weight`` throughout the codebase, which would affect how
29+
# meta-estimators such as :class:`pipeline.Pipeline` and
30+
# :class:`model_selection.GridSearchCV` route metadata. While the
31+
# infrastructure for this feature is already included in this release, the work
32+
# is ongoing and not all meta-estimators support this new feature. You can read
33+
# more about this feature in the :ref:`Metadata Routing User Guide
34+
# <metadata_routing>`. Note that this feature is still under development and
35+
# not implemented for most meta-estimators.
36+
#
37+
# Third party developers can already start incorporating this into their
38+
# meta-estimators. For more details, see
39+
# :ref:`metadata routing developer guide
40+
# <sphx_glr_auto_examples_miscellaneous_plot_metadata_routing.py>`.
41+
42+
# %%
43+
# HDBSCAN: hierarchical density-based clustering
44+
# ----------------------------------------------
45+
# Originally hosted in the scikit-learn-contrib repository, :class:`cluster.HDBSCAN`
46+
# has been adpoted into scikit-learn. It's missing a few features from the original
47+
# implementation which will be added in future releases.
48+
# By performing a modified version of :class:`cluster.DBSCAN` over multiple epsilon
49+
# values simultaneously, :class:`cluster.HDBSCAN` finds clusters of varying densities
50+
# making it more robust to parameter selection than :class:`cluster.DBSCAN`.
51+
# More details in the :ref:`User Guide <hdbscan>`.
52+
import numpy as np
53+
from sklearn.cluster import HDBSCAN
54+
from sklearn.datasets import load_digits
55+
from sklearn.metrics import v_measure_score
56+
57+
X, true_labels = load_digits(return_X_y=True)
58+
print(f"number of digits: {len(np.unique(true_labels))}")
59+
60+
hdbscan = HDBSCAN(min_cluster_size=15).fit(X)
61+
non_noisy_labels = hdbscan.labels_[hdbscan.labels_ != -1]
62+
print(f"number of clusters found: {len(np.unique(non_noisy_labels))}")
63+
64+
print(v_measure_score(true_labels[hdbscan.labels_ != -1], non_noisy_labels))
65+
66+
# %%
67+
# TargetEncoder: a new category encoding strategy
68+
# -----------------------------------------------
69+
# Well suited for categorical features with high cardinality,
70+
# :class:`preprocessing.TargetEncoder` encodes the categories based on a shrunk
71+
# estimate of the average target values for observations belonging to that category.
72+
# More details in the :ref:`User Guide <target_encoder>`.
73+
import numpy as np
74+
from sklearn.preprocessing import TargetEncoder
75+
76+
X = np.array([["cat"] * 30 + ["dog"] * 20 + ["snake"] * 38], dtype=object).T
77+
y = [90.3] * 30 + [20.4] * 20 + [21.2] * 38
78+
79+
enc = TargetEncoder(random_state=0)
80+
X_trans = enc.fit_transform(X, y)
81+
82+
enc.encodings_
83+
84+
# %%
85+
# Missing values support in decision trees
86+
# ----------------------------------------
87+
# The classes :class:`tree.DecisionTreeClassifier` and
88+
# :class:`tree.DecisionTreeRegressor` now support missing values. For each potential
89+
# threshold on the non-missing data, the splitter will evaluate the split with all the
90+
# missing values going to the left node or the right node.
91+
# More details in the :ref:`User Guide <tree_missing_value_support>`.
92+
import numpy as np
93+
from sklearn.tree import DecisionTreeClassifier
94+
95+
X = np.array([0, 1, 6, np.nan]).reshape(-1, 1)
96+
y = [0, 0, 1, 1]
97+
98+
tree = DecisionTreeClassifier(random_state=0).fit(X, y)
99+
tree.predict(X)
100+
101+
# %%
102+
# New display `model_selection.ValidationCurveDisplay`
103+
# ----------------------------------------------------
104+
# :class:`model_selection.ValidationCurveDisplay` is now available to plot results
105+
# from :func:`model_selection.validation_curve`.
106+
from sklearn.datasets import make_classification
107+
from sklearn.linear_model import LogisticRegression
108+
from sklearn.model_selection import ValidationCurveDisplay
109+
110+
X, y = make_classification(1000, 10, random_state=0)
111+
112+
_ = ValidationCurveDisplay.from_estimator(
113+
LogisticRegression(),
114+
X,
115+
y,
116+
param_name="C",
117+
param_range=np.geomspace(1e-5, 1e3, num=9),
118+
score_type="both",
119+
score_name="Accuracy",
120+
)
121+
122+
# %%
123+
# Gamma loss for gradient boosting
124+
# --------------------------------
125+
# The class :class:`ensemble.HistGradientBoostingRegressor` supports the
126+
# Gamma deviance loss function via `loss="gamma"`. This loss function is useful for
127+
# modeling strictly positive targets with a right-skewed distribution.
128+
import numpy as np
129+
from sklearn.model_selection import cross_val_score
130+
from sklearn.datasets import make_low_rank_matrix
131+
from sklearn.ensemble import HistGradientBoostingRegressor
132+
133+
n_samples, n_features = 500, 10
134+
rng = np.random.RandomState(0)
135+
X = make_low_rank_matrix(n_samples, n_features, random_state=rng)
136+
coef = rng.uniform(low=-10, high=20, size=n_features)
137+
y = rng.gamma(shape=2, scale=np.exp(X @ coef) / 2)
138+
gbdt = HistGradientBoostingRegressor(loss="gamma")
139+
cross_val_score(gbdt, X, y).mean()
140+
141+
# %%
142+
# Grouping infrequent categories in :class:`preprocessing.OrdinalEncoder`
143+
# -----------------------------------------------------------------------
144+
# Similarly to :class:`preprocessing.OneHotEncoder`, the class
145+
# :class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories
146+
# into a single output for each feature. The parameters to enable the gathering of
147+
# infrequent categories are `min_frequency` and `max_categories`.
148+
# See the :ref:`User Guide <encoder_infrequent_categories>` for more details.
149+
from sklearn.preprocessing import OrdinalEncoder
150+
import numpy as np
151+
152+
X = np.array(
153+
[["dog"] * 5 + ["cat"] * 20 + ["rabbit"] * 10 + ["snake"] * 3], dtype=object
154+
).T
155+
enc = OrdinalEncoder(min_frequency=6).fit(X)
156+
enc.infrequent_categories_

sklearn/covariance/_empirical_covariance.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,13 @@
2323
from ..utils.extmath import fast_logdet
2424

2525

26+
@validate_params(
27+
{
28+
"emp_cov": [np.ndarray],
29+
"precision": [np.ndarray],
30+
},
31+
prefer_skip_nested_validation=True,
32+
)
2633
def log_likelihood(emp_cov, precision):
2734
"""Compute the sample mean of the log_likelihood under a covariance model.
2835

sklearn/datasets/_twenty_newsgroups.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,7 @@ def fetch_20newsgroups(
209209
make the assumption that the samples are independent and identically
210210
distributed (i.i.d.), such as stochastic gradient descent.
211211
212-
random_state : int, RandomState instance or None, default=None
212+
random_state : int, RandomState instance or None, default=42
213213
Determines random number generation for dataset shuffling. Pass an int
214214
for reproducible output across multiple function calls.
215215
See :term:`Glossary <random_state>`.

0 commit comments

Comments
 (0)