Skip to content

Conversation

goraj
Copy link

@goraj goraj commented Nov 11, 2024

Hi all,

New to the project but I've been loosely following @adam2392 and the project for a while now.
I setup a dev environment according to DEVELOPMENT.md and ran into a few issues due to sklearn 1.6.dev0 being installed. Namely the introduction of check_sample_weight_equivalence in scikit-learn/scikit-learn@364cafe leads to expected but not skipped test-case failures.

What does this implement/fix? Explain your changes.

Changes will skip check_sample_weight_equivalence testing for forest implementations. It also addresses some un-pickling issues encountered during testing due to joblib/loky in the structure of treeple.stats.utils.
Changes should be backward compatible.

@goraj goraj changed the title [FIX] sklearn 1.6.dev0 adjustments. [WIP] sklearn 1.6.dev0 adjustments. Nov 12, 2024
@goraj goraj changed the title [WIP] sklearn 1.6.dev0 adjustments. [ENH] sklearn 1.6.dev0 adjustments. Nov 12, 2024
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.33%. Comparing base (e1c38ad) to head (f0f2a9e).
⚠️ Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
treeple/ensemble/_honest_forest.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #335      +/-   ##
==========================================
- Coverage   80.50%   80.33%   -0.18%     
==========================================
  Files          24       24              
  Lines        2334     2339       +5     
  Branches      339      339              
==========================================
  Hits         1879     1879              
- Misses        318      322       +4     
- Partials      137      138       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@goraj
Copy link
Author

goraj commented Dec 9, 2024

Nothing?

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @goraj and thanks for the PR!

I left a review. So I was thinking, should we just revamp how we're using the parametrize_with_checks function. Instead of labeling sample_weight_equivalence as an ignored test per area, perhaps let's introduce a test_common.py function under treeple/tests/, and then we can do the "sklearn compatible" check there. Then, we can consolidate what tests we want to ignore in the same file.

Kind of like here:

https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/utils/_test_common/instance_generator.py#L771

and how expected_failed_checks kwarg of parametrize_with_checks is used to ignore tests inside https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/tests/test_common.py#L118

WDYT?

Comment on lines +237 to +248
with parallel_config("multiprocessing"):
out = Parallel(n_jobs=n_jobs)(
delayed(_parallel_build_null_forests)(
y_pred_ind_arr,
n_estimators,
all_y_pred,
y_test,
seed,
metric,
**metric_kwargs,
)
for i, seed in zip(range(n_repeats), ss.spawn(n_repeats))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change made?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly the default loky would segfault during unit testing the *Oblique trees.

Comment on lines +516 to +517
with parallel_config("multiprocessing"):
out = Parallel(n_jobs=n_jobs)(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@goraj
Copy link
Author

goraj commented Dec 17, 2024

Sorry for the delay @goraj and thanks for the PR!

I left a review. So I was thinking, should we just revamp how we're using the parametrize_with_checks function. Instead of labeling sample_weight_equivalence as an ignored test per area, perhaps let's introduce a test_common.py function under treeple/tests/, and then we can do the "sklearn compatible" check there. Then, we can consolidate what tests we want to ignore in the same file.

Kind of like here:

https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/utils/_test_common/instance_generator.py#L771

and how expected_failed_checks kwarg of parametrize_with_checks is used to ignore tests inside https://github.com/scikit-learn/scikit-learn/blob/66270e46b77d6202559bae4929ec83ab320beb1e/sklearn/tests/test_common.py#L118

WDYT?

Thank you.
I agree that would help quite a bit. I will update the PR accordingly, just a bit busy right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants