-
Notifications
You must be signed in to change notification settings - Fork 90
[FEATURE] Add support for outlier detectors (GASearchCV & GAFeatureSelectionCV) - Fixes #162 #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add support for outlier detectors (GASearchCV & GAFeatureSelectionCV) - Fixes #162 #163
Conversation
Hi @XBastille thanks, it looks good, I'll test it locally and let you know if I have any feedback |
Yeah sure @rodrigo-arenas !! Please take a look and feel free to let me know!!! |
Hi @rodrigo-arenas ! I am seeing a test fail because of the PR not reaching the required coverage. Am I supposed to do refactor something? Please let me know |
Hi @XBastille it failed because the code coverage is bellow 95%, please check if there are some lines that you added with no test coverage, you can see that in the failure report and even running the test locally, thanks |
Hi @rodrigo-arenas I have updated the PR, required test coverage of 95% reached. Total coverage: 95.12%, kindly check!!! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #163 +/- ##
==========================================
- Coverage 95.39% 95.37% -0.03%
==========================================
Files 26 26
Lines 1151 1189 +38
==========================================
+ Hits 1098 1134 +36
- Misses 53 55 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Hi @rodrigo-arenas, it seems that the Codecov report is showing 85% coverage, while all other tests are passing. Could you please advise how I can run the tests related to Codecov locally to investigate this further? in the meantime, I’ve pushed another commit. Kindly re-run the tests when convenient. Thank you! |
thanks @XBastille I merged this and it will be on the next release |
@rodrigo-arenas Awesome!!, thank you for merging, happy to contribute!!!, lemme know if there's any need to tweak or any other request, I will try to contribute!! |
@XBastille thanks! There are a few open issues. if you want to take a look at any of those, it'd be of great help, I'm also open to suggestions to add new features |
Ahh..I see, alright then I will see what I can and let you know, thank you @rodrigo-arenas 🙏🏼 |
This PR addresses issue #162 where @asdf32768 couldn't use outlier detection algorithms like IsolationForest with GASearchCV. The library was throwing this error:
This happened because the validation logic only accepted classifiers and regressors, but outlier detection algorithms are a separate category in scikit learn. So what I have done is I've extended the library to support outlier detection algorithms alongside the existing classifier and regressor support. The implementation handles the unique characteristics of outlier detection:
What Changed??
Core Changes in genetic_search.py:
New Test File:
Updated Existing Tests:
All tests pass, including the new outlier detection test suite. I verified that the exact use case from issue #162 now works correctly. Users can now optimize hyperparameters for isolation forest and other outlier detectors using the genetic algorithm approach.
The implementation maintains full backward compatibility so....the existing code continues to work exactly as before.
Usage Example
After this change, users can do exactly what was requested in the original issue:
This works for any scikit learn outlier detector including isolation forest, oneclasssvm, localoutlierfactor, and ellipticenvelope.
Implementation Notes
The approach I took was to extend the existing validation and scoring logic rather than creating separate code paths. The default scoring for outlier detectors prioritizes score_samples() when available (which provides the anomaly scores), falls back to decision_function(), and finally uses fit_predict() as a last resort.