Skip to content

Commit 7b6b657

Browse files
StefanieSengeradrinjalaliogrisel
authored
DOC clearer definition of estimator to be used in last step of a pipeline (scikit-learn#26952)
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
1 parent 3f6bc8e commit 7b6b657

File tree

2 files changed

+32
-19
lines changed

2 files changed

+32
-19
lines changed

doc/modules/compose.rst

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,23 @@
55
Pipelines and composite estimators
66
==================================
77

8-
Transformers are usually combined with classifiers, regressors or other
9-
estimators to build a composite estimator. The most common tool is a
10-
:ref:`Pipeline <pipeline>`. Pipeline is often used in combination with
11-
:ref:`FeatureUnion <feature_union>` which concatenates the output of
12-
transformers into a composite feature space. :ref:`TransformedTargetRegressor
13-
<transformed_target_regressor>` deals with transforming the :term:`target`
14-
(i.e. log-transform :term:`y`). In contrast, Pipelines only transform the
15-
observed data (:term:`X`).
8+
To build a composite estimator, transformers are usually combined with other
9+
transformers or with :term:`predictors` (such as classifiers or regressors).
10+
The most common tool used for composing estimators is a :ref:`Pipeline
11+
<pipeline>`. Pipelines require all steps except the last to be a
12+
:term:`transformer`. The last step can be anything, a transformer, a
13+
:term:`predictor`, or a clustering estimator which might have or not have a
14+
`.predict(...)` method. A pipeline exposes all methods provided by the last
15+
estimator: if the last step provides a `transform` method, then the pipeline
16+
would have a `transform` method and behave like a transformer. If the last step
17+
provides a `predict` method, then the pipeline would expose that method, and
18+
given a data :term:`X`, use all steps except the last to transform the data,
19+
and then give that transformed data to the `predict` method of the last step of
20+
the pipeline. `Pipeline` is often used in combination with :ref:`Column
21+
Transformer <column_transformer>` or :ref:`FeatureUnion <feature_union>` which
22+
concatenate the output of transformers into a composite feature space.
23+
:ref:`TransformedTargetRegressor <transformed_target_regressor>` deals with
24+
transforming the :term:`target` (i.e. log-transform :term:`y`).
1625

1726
.. _pipeline:
1827

sklearn/pipeline.py

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,15 @@ def check(self):
5353

5454
class Pipeline(_BaseComposition):
5555
"""
56-
Pipeline of transforms with a final estimator.
56+
A sequence of data transformers with an optional final predictor.
57+
58+
`Pipeline` allows you to sequentially apply a list of transformers to
59+
preprocess the data and, if desired, conclude the sequence with a final
60+
:term:`predictor` for predictive modeling.
5761
58-
Sequentially apply a list of transforms and a final estimator.
5962
Intermediate steps of the pipeline must be 'transforms', that is, they
6063
must implement `fit` and `transform` methods.
61-
The final estimator only needs to implement `fit`.
64+
The final :term:`estimator` only needs to implement `fit`.
6265
The transformers in the pipeline can be cached using ``memory`` argument.
6366
6467
The purpose of the pipeline is to assemble several steps that can be
@@ -81,10 +84,11 @@ class Pipeline(_BaseComposition):
8184
8285
Parameters
8386
----------
84-
steps : list of tuple
85-
List of (name, transform) tuples (implementing `fit`/`transform`) that
86-
are chained in sequential order. The last transform must be an
87-
estimator.
87+
steps : list of tuples
88+
List of (name of step, estimator) tuples that are to be chained in
89+
sequential order. To be compatible with the scikit-learn API, all steps
90+
must define `fit`. All non-last steps must also define `transform`. See
91+
:ref:`Combining Estimators <combining_estimators>` for more details.
8892
8993
memory : str or object with the joblib.Memory interface, default=None
9094
Used to cache the fitted transformers of the pipeline. The last step
@@ -414,7 +418,7 @@ def _fit(self, X, y=None, routed_params=None):
414418
def fit(self, X, y=None, **params):
415419
"""Fit the model.
416420
417-
Fit all the transformers one after the other and transform the
421+
Fit all the transformers one after the other and sequentially transform the
418422
data. Finally, fit the transformed data using the final estimator.
419423
420424
Parameters
@@ -478,9 +482,9 @@ def _can_fit_transform(self):
478482
def fit_transform(self, X, y=None, **params):
479483
"""Fit the model and transform with the final estimator.
480484
481-
Fits all the transformers one after the other and transform the
482-
data. Then uses `fit_transform` on transformed data with the final
483-
estimator.
485+
Fit all the transformers one after the other and sequentially transform
486+
the data. Only valid if the final estimator either implements
487+
`fit_transform` or `fit` and `transform`.
484488
485489
Parameters
486490
----------

0 commit comments

Comments
 (0)