Skip to content

Conversation

XBastille
Copy link
Contributor

This PR addresses issue #162 where @asdf32768 couldn't use outlier detection algorithms like IsolationForest with GASearchCV. The library was throwing this error:

ValueError: IsolationForest() is not a valid Sklearn classifier or regressor

This happened because the validation logic only accepted classifiers and regressors, but outlier detection algorithms are a separate category in scikit learn. So what I have done is I've extended the library to support outlier detection algorithms alongside the existing classifier and regressor support. The implementation handles the unique characteristics of outlier detection:

  • Unsupervised learning: Many outlier detectors don't require target labels (y can be None)
  • Different scoring methods: Outlier detectors use score_samples(), decision_function(), or fit_predict() instead of the standard score() method
  • Cross-validation considerations: Outlier detection needs different CV handling than supervised learning

What Changed??

Core Changes in genetic_search.py:

  • Updated the validation checks in both GASearchCV and GAFeatureSelectionCV constructors to accept outlier detectors
  • Modified the fit() methods to handle cases where y=None for unsupervised outlier detection
  • Added logic to create appropriate default scorers for outlier detectors when no scoring function is provided
  • Enhanced cross-validation setup to work properly with outlier detection algorithms

New Test File:

  • Created comprehensive test suite (test_outliner_detection.py) covering isolation forest, oneclasssvm, and localoutlierfactor
  • Tests include both GASearchCV and GAFeatureSelectionCV functionality
  • Added tests for custom scoring functions and error handling
  • Verified that cv_results structure works correctly with outlier detectors

Updated Existing Tests:

  • Fixed two existing test assertions that expected the old error message format
  • Tests now expect the updated error message that includes outlier detectors

All tests pass, including the new outlier detection test suite. I verified that the exact use case from issue #162 now works correctly. Users can now optimize hyperparameters for isolation forest and other outlier detectors using the genetic algorithm approach.

The implementation maintains full backward compatibility so....the existing code continues to work exactly as before.

Usage Example

After this change, users can do exactly what was requested in the original issue:

from sklearn.ensemble import IsolationForest
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Integer, Categorical

estimator = IsolationForest()
param_grid = {
    'contamination': Continuous(0.001, 0.5, distribution='log-uniform'),
    'n_estimators': Integer(100, 1000),
    'max_samples': Integer(1, 1000),
    'max_features': Integer(1, 10),
    'bootstrap': Categorical([True, False])
}

ga_search = GASearchCV(estimator=estimator, param_grid=param_grid)
ga_search.fit(X_train)  

This works for any scikit learn outlier detector including isolation forest, oneclasssvm, localoutlierfactor, and ellipticenvelope.

Implementation Notes

The approach I took was to extend the existing validation and scoring logic rather than creating separate code paths. The default scoring for outlier detectors prioritizes score_samples() when available (which provides the anomaly scores), falls back to decision_function(), and finally uses fit_predict() as a last resort.

@rodrigo-arenas
Copy link
Owner

Hi @XBastille thanks, it looks good, I'll test it locally and let you know if I have any feedback

@XBastille
Copy link
Contributor Author

Yeah sure @rodrigo-arenas !! Please take a look and feel free to let me know!!!

@XBastille
Copy link
Contributor Author

Hi @rodrigo-arenas ! I am seeing a test fail because of the PR not reaching the required coverage. Am I supposed to do refactor something? Please let me know

@rodrigo-arenas
Copy link
Owner

Hi @XBastille it failed because the code coverage is bellow 95%, please check if there are some lines that you added with no test coverage, you can see that in the failure report and even running the test locally, thanks

@XBastille
Copy link
Contributor Author

XBastille commented Jun 5, 2025

Hi @rodrigo-arenas I have updated the PR, required test coverage of 95% reached. Total coverage: 95.12%, kindly check!!!

@codecov
Copy link

codecov bot commented Jun 6, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 5 lines in your changes missing coverage. Please review.

Project coverage is 95.37%. Comparing base (f41f555) to head (d339cda).
Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
sklearn_genetic/genetic_search.py 90.90% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #163      +/-   ##
==========================================
- Coverage   95.39%   95.37%   -0.03%     
==========================================
  Files          26       26              
  Lines        1151     1189      +38     
==========================================
+ Hits         1098     1134      +36     
- Misses         53       55       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@XBastille
Copy link
Contributor Author

Hi @rodrigo-arenas, it seems that the Codecov report is showing 85% coverage, while all other tests are passing. Could you please advise how I can run the tests related to Codecov locally to investigate this further? in the meantime, I’ve pushed another commit. Kindly re-run the tests when convenient. Thank you!

@rodrigo-arenas rodrigo-arenas merged commit 931133c into rodrigo-arenas:master Jun 10, 2025
10 of 11 checks passed
@rodrigo-arenas
Copy link
Owner

thanks @XBastille I merged this and it will be on the next release

@XBastille
Copy link
Contributor Author

@rodrigo-arenas Awesome!!, thank you for merging, happy to contribute!!!, lemme know if there's any need to tweak or any other request, I will try to contribute!!

@rodrigo-arenas
Copy link
Owner

@XBastille thanks! There are a few open issues. if you want to take a look at any of those, it'd be of great help, I'm also open to suggestions to add new features

@XBastille
Copy link
Contributor Author

Ahh..I see, alright then I will see what I can and let you know, thank you @rodrigo-arenas 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants