Skip to content

Commit 6638d28

Browse files
ArturoAmorQArturoAmorQglemaitre
authored
DOC Fix dropdown-related warnings (scikit-learn#27418)
Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
1 parent c838875 commit 6638d28

File tree

2 files changed

+68
-61
lines changed

2 files changed

+68
-61
lines changed

doc/modules/compose.rst

Lines changed: 65 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,8 @@ The last estimator may be any type (transformer, classifier, etc.).
5454
Usage
5555
-----
5656

57-
|details-start|
58-
**Construction**
59-
|details-split|
57+
Build a pipeline
58+
................
6059

6160
The :class:`Pipeline` is built using a list of ``(key, value)`` pairs, where
6261
the ``key`` is a string containing the name you want to give this step and ``value``
@@ -70,6 +69,10 @@ is an estimator object::
7069
>>> pipe
7170
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])
7271

72+
|details-start|
73+
**Shortand version using :func:`make_pipeline`**
74+
|details-split|
75+
7376
The utility function :func:`make_pipeline` is a shorthand
7477
for constructing pipelines;
7578
it takes a variable number of estimators and returns a pipeline,
@@ -81,14 +84,26 @@ filling in the names automatically::
8184

8285
|details-end|
8386

87+
Access pipeline steps
88+
.....................
89+
90+
The estimators of a pipeline are stored as a list in the ``steps`` attribute.
91+
A sub-pipeline can be extracted using the slicing notation commonly used
92+
for Python Sequences such as lists or strings (although only a step of 1 is
93+
permitted). This is convenient for performing only some of the transformations
94+
(or their inverse):
95+
96+
>>> pipe[:1]
97+
Pipeline(steps=[('reduce_dim', PCA())])
98+
>>> pipe[-1:]
99+
Pipeline(steps=[('clf', SVC())])
100+
84101
|details-start|
85-
**Accessing steps**
102+
**Accessing a step by name or position**
86103
|details-split|
87104

88-
89-
The estimators of a pipeline are stored as a list in the ``steps`` attribute,
90-
but can be accessed by index or name by indexing (with ``[idx]``) the
91-
Pipeline::
105+
A specific step can also be accessed by index or name by indexing (with ``[idx]``) the
106+
pipeline::
92107

93108
>>> pipe.steps[0]
94109
('reduce_dim', PCA())
@@ -97,36 +112,61 @@ Pipeline::
97112
>>> pipe['reduce_dim']
98113
PCA()
99114

100-
Pipeline's `named_steps` attribute allows accessing steps by name with tab
115+
`Pipeline`'s `named_steps` attribute allows accessing steps by name with tab
101116
completion in interactive environments::
102117

103118
>>> pipe.named_steps.reduce_dim is pipe['reduce_dim']
104119
True
105120

106-
A sub-pipeline can also be extracted using the slicing notation commonly used
107-
for Python Sequences such as lists or strings (although only a step of 1 is
108-
permitted). This is convenient for performing only some of the transformations
109-
(or their inverse):
121+
|details-end|
110122

111-
>>> pipe[:1]
112-
Pipeline(steps=[('reduce_dim', PCA())])
113-
>>> pipe[-1:]
114-
Pipeline(steps=[('clf', SVC())])
123+
Tracking feature names in a pipeline
124+
....................................
115125

116-
|details-end|
126+
To enable model inspection, :class:`~sklearn.pipeline.Pipeline` has a
127+
``get_feature_names_out()`` method, just like all transformers. You can use
128+
pipeline slicing to get the feature names going into each step::
117129

118-
.. _pipeline_nested_parameters:
130+
>>> from sklearn.datasets import load_iris
131+
>>> from sklearn.feature_selection import SelectKBest
132+
>>> iris = load_iris()
133+
>>> pipe = Pipeline(steps=[
134+
... ('select', SelectKBest(k=2)),
135+
... ('clf', LogisticRegression())])
136+
>>> pipe.fit(iris.data, iris.target)
137+
Pipeline(steps=[('select', SelectKBest(...)), ('clf', LogisticRegression(...))])
138+
>>> pipe[:-1].get_feature_names_out()
139+
array(['x2', 'x3'], ...)
119140

120141
|details-start|
121-
**Nested parameters**
142+
**Customize feature names**
122143
|details-split|
123144

124-
Parameters of the estimators in the pipeline can be accessed using the
125-
``<estimator>__<parameter>`` syntax::
145+
You can also provide custom feature names for the input data using
146+
``get_feature_names_out``::
147+
148+
>>> pipe[:-1].get_feature_names_out(iris.feature_names)
149+
array(['petal length (cm)', 'petal width (cm)'], ...)
150+
151+
|details-end|
152+
153+
.. _pipeline_nested_parameters:
154+
155+
Access to nested parameters
156+
...........................
157+
158+
It is common to adjust the parameters of an estimator within a pipeline. This parameter
159+
is therefore nested because it belongs to a particular sub-step. Parameters of the
160+
estimators in the pipeline are accessible using the ``<estimator>__<parameter>``
161+
syntax::
126162

127163
>>> pipe.set_params(clf__C=10)
128164
Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC(C=10))])
129165

166+
|details-start|
167+
**When does it matter?**
168+
|details-split|
169+
130170
This is particularly important for doing grid searches::
131171

132172
>>> from sklearn.model_selection import GridSearchCV
@@ -143,36 +183,11 @@ ignored by setting them to ``'passthrough'``::
143183
... clf__C=[0.1, 10, 100])
144184
>>> grid_search = GridSearchCV(pipe, param_grid=param_grid)
145185

146-
The estimators of the pipeline can be retrieved by index:
147-
148-
>>> pipe[0]
149-
PCA()
150-
151-
or by name::
152-
153-
>>> pipe['reduce_dim']
154-
PCA()
155-
156-
To enable model inspection, :class:`~sklearn.pipeline.Pipeline` has a
157-
``get_feature_names_out()`` method, just like all transformers. You can use
158-
pipeline slicing to get the feature names going into each step::
159-
160-
>>> from sklearn.datasets import load_iris
161-
>>> from sklearn.feature_selection import SelectKBest
162-
>>> iris = load_iris()
163-
>>> pipe = Pipeline(steps=[
164-
... ('select', SelectKBest(k=2)),
165-
... ('clf', LogisticRegression())])
166-
>>> pipe.fit(iris.data, iris.target)
167-
Pipeline(steps=[('select', SelectKBest(...)), ('clf', LogisticRegression(...))])
168-
>>> pipe[:-1].get_feature_names_out()
169-
array(['x2', 'x3'], ...)
186+
.. topic:: See Also:
170187

171-
You can also provide custom feature names for the input data using
172-
``get_feature_names_out``::
188+
* :ref:`composite_grid_search`
173189

174-
>>> pipe[:-1].get_feature_names_out(iris.feature_names)
175-
array(['petal length (cm)', 'petal width (cm)'], ...)
190+
|details-end|
176191

177192
.. topic:: Examples:
178193

@@ -184,11 +199,6 @@ You can also provide custom feature names for the input data using
184199
* :ref:`sphx_glr_auto_examples_compose_plot_compare_reduction.py`
185200
* :ref:`sphx_glr_auto_examples_miscellaneous_plot_pipeline_display.py`
186201

187-
.. topic:: See Also:
188-
189-
* :ref:`composite_grid_search`
190-
191-
|details-end|
192202

193203
.. _pipeline_cache:
194204

doc/modules/feature_extraction.rst

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ it is advisable to use a power of two as the ``n_features`` parameter;
225225
otherwise the features will not be mapped evenly to the columns.
226226

227227
.. topic:: References:
228-
228+
229229
* `MurmurHash3 <https://github.com/aappleby/smhasher>`_.
230230

231231
|details-end|
@@ -398,9 +398,8 @@ last document::
398398

399399
.. _stop_words:
400400

401-
|details-start|
402-
**Using stop words**
403-
|details-split|
401+
Using stop words
402+
----------------
404403

405404
Stop words are words like "and", "the", "him", which are presumed to be
406405
uninformative in representing the content of a text, and which may be
@@ -431,8 +430,6 @@ identify and warn about some kinds of inconsistencies.
431430
In *Proc. Workshop for NLP Open Source Software*.
432431
433432
434-
|details-end|
435-
436433
.. _tfidf:
437434

438435
Tf–idf term weighting

0 commit comments

Comments
 (0)