`n_jobs` support details in docs #2453

Alexsandruss · 2025-04-25T16:14:34Z

Description

Adds a doc page for n_jobs specifics of sklearnex.

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

N/A

codecov · 2025-04-25T17:29:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Flag	Coverage Δ
azure	`?`
github	`71.96% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 41 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

david-cortes-intel · 2025-05-05T07:27:44Z

doc/sources/parallelism.rst

@@ -0,0 +1,46 @@
+.. Copyright 2025 Intel Corporation


We're using a different copyright header now: https://github.com/uxlfoundation/oneDAL/blob/main/CONTRIBUTING.md#license-and-copyright

david-cortes-intel · 2025-05-05T07:28:40Z

doc/sources/parallelism.rst

+* `n_jobs` parameter is supported for all estimators patched by |sklearnex|,
+  while |sklearn| enables it for selected estimators only
+* `n_jobs` estimator parameter sets the number of threads used by the underlying |oneDAL|
+* |sklearnex| doesn't use `joblib` for parallelism in patched estimators and functions
+* The only low-level parallelism library used by |sklearnex| is oneTBB (through oneDAL)
+* The `threading` parallel backend of `joblib` is not supported by |sklearnex|


david-cortes-intel · 2025-05-05T07:30:25Z

doc/sources/parallelism.rst

+* The only low-level parallelism library used by |sklearnex| is oneTBB (through oneDAL)
+* The `threading` parallel backend of `joblib` is not supported by |sklearnex|
+
+The only exception is multiclass LogisticRegression, which uses `joblib` for parallelism across classes.


Suggested change

The only exception is multiclass LogisticRegression, which uses `joblib` for parallelism across classes.

The only exception is multiclass LogisticRegression, which uses :external+joblib:doc:`joblib <index>` for parallelism across classes.

(perhaps it could be added in the substitutions list)

david-cortes-intel · 2025-05-05T07:31:18Z

doc/sources/parallelism.rst

+  while |sklearn| enables it for selected estimators only
+* `n_jobs` estimator parameter sets the number of threads used by the underlying |oneDAL|
+* |sklearnex| doesn't use `joblib` for parallelism in patched estimators and functions
+* The only low-level parallelism library used by |sklearnex| is oneTBB (through oneDAL)


There's also multi-threading from the MKL side.

david-cortes-intel · 2025-05-05T07:32:36Z

doc/sources/parallelism.rst

+The only exception is multiclass LogisticRegression, which uses `joblib` for parallelism across classes.
+
+|sklearnex| follows the same rules as |sklearn| for
+`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`_.


Suggested change

`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`_.

`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`__.

Link is repeated, single underscore makes it a named reference, which can cause with repetitions that change the name.

david-cortes-intel · 2025-05-05T08:00:36Z

doc/sources/parallelism.rst

+|sklearnex| supports the `n_jobs <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`_ parameter
+of the original |sklearn| with the following differences:


david-cortes-intel · 2025-05-05T08:01:20Z

doc/sources/parallelism.rst

+Environment variables such as `OMP_NUM_THREADS`, `MKL_NUM_THREADS`, `OPENBLAS_NUM_THREADS`, and others used by
+low-level parallelism libraries are recognized by `joblib` and therefore can be used as hints by |sklearnex|.
+
+To track the actual number of threads used by sklearnex's estimators,


There's also the MKL debug variable, and now the oneDAL debug variable.

david-cortes-intel · 2025-05-05T08:08:48Z

doc/sources/parallelism.rst

+|sklearnex| follows the same rules as |sklearn| for
+`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`_.
+
+When Scikit-learn's utilities with built-in parallelism are used (for example, `GridSearchCV` or `VotingClassifier`),


Suggested change

When Scikit-learn's utilities with built-in parallelism are used (for example, `GridSearchCV` or `VotingClassifier`),

When Scikit-learn's utilities with built-in parallelism are used (for example, :obj:`sklearn.model_selection.GridSearchCV` or :obj:`sklearn.model_selection.VotingClassifier`),

david-cortes-intel · 2025-05-05T08:10:08Z

doc/sources/parallelism.rst

+`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`_.
+
+When Scikit-learn's utilities with built-in parallelism are used (for example, `GridSearchCV` or `VotingClassifier`),
+|sklearnex| tries to determine the optimal number of threads per job using hints provided by `joblib`.


I wasn't aware that such a system existed. Could you provide a link to the code where this happens?

david-cortes-intel · 2025-05-05T08:15:36Z

doc/sources/parallelism.rst

+* `n_jobs` estimator parameter sets the number of threads used by the underlying |oneDAL|
+* |sklearnex| doesn't use `joblib` for parallelism in patched estimators and functions
+* The only low-level parallelism library used by |sklearnex| is oneTBB (through oneDAL)
+* The `threading` parallel backend of `joblib` is not supported by |sklearnex|


What does it mean "not supported by sklearnex"? What happens for example if you run an sklearn metaestimator (like BaggingClassifier) using that joblib backend with an sklearnex estimator inside?

david-cortes-intel · 2025-05-05T08:19:25Z

Thanks for adding these explanations. But it's still missing important pieces of information and leaves several questions unanswered:

It's missing the threading part of MKL, the static linkage part, and how it interacts with environment variables, n_jobs parameter, inner_max_num_threads parameter, and threadpoolctl configurations.
It doesn't mention how the threading works when put under a threadpoolctl context.
The explanation is unclear about what ends up happening with the number of threads when using environment variables in addition to passing n_jobs as parameter.
Could mention what happens with n_jobs in GPU mode.
There's a difference in the threading configuration logic between daal4py and sklearnex, which this doc could also mention.
It doesn't cover the part about some configurations being global, which is quite relevant when using python-based multi-threading.
It could mention that the TBB threading doesn't automatically avoid nested parallelism when used in conjunction with OpenMP (which sklearn uses) and/or with joblib or python threads.
Some estimators perform better when not using all threads - for example, linear regression is faster on LNL laptops when not using low-power E-cores. Perhaps could mention these sort of things here as they are relevant.

icfaust · 2025-05-26T14:41:43Z

@Alexsandruss make sure to merge main for latest CI checks on docs

n_jobs support details in docs

eca279b

Alexsandruss added the documentation label Apr 25, 2025

Fixes for doc page

c11fcfe

Alexsandruss marked this pull request as ready for review April 25, 2025 17:00

Alexsandruss requested review from maria-Petrova, napetrov, icfaust and david-cortes-intel as code owners April 25, 2025 17:00

david-cortes-intel reviewed May 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`n_jobs` support details in docs #2453

`n_jobs` support details in docs #2453

Uh oh!

Alexsandruss commented Apr 25, 2025 •

edited

Loading

Uh oh!

codecov bot commented Apr 25, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel May 5, 2025

Uh oh!

david-cortes-intel commented May 5, 2025

Uh oh!

icfaust commented May 26, 2025

Uh oh!

Uh oh!

	The only exception is multiclass LogisticRegression, which uses `joblib` for parallelism across classes.
	The only exception is multiclass LogisticRegression, which uses :external+joblib:doc:`joblib <index>` for parallelism across classes.

	`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`_.
	`the calculation of the 'n_jobs' parameter value <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`__.

		\|sklearnex\| supports the `n_jobs <https://scikit-learn.org/stable/glossary.html#term-n_jobs>`_ parameter
		of the original \|sklearn\| with the following differences:

	When Scikit-learn's utilities with built-in parallelism are used (for example, `GridSearchCV` or `VotingClassifier`),
	When Scikit-learn's utilities with built-in parallelism are used (for example, :obj:`sklearn.model_selection.GridSearchCV` or :obj:`sklearn.model_selection.VotingClassifier`),

n_jobs support details in docs #2453

Are you sure you want to change the base?

n_jobs support details in docs #2453

Uh oh!

Conversation

Alexsandruss commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov bot commented Apr 25, 2025

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel commented May 5, 2025

Uh oh!

icfaust commented May 26, 2025

Uh oh!

Uh oh!

`n_jobs` support details in docs #2453

`n_jobs` support details in docs #2453

Alexsandruss commented Apr 25, 2025 •

edited

Loading