-
Notifications
You must be signed in to change notification settings - Fork 67
Feature rdkit_scaled and mordred_filtered_scaled feature sets #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…scaler_unit_variance
…ault true and default false lists
Ipc should not be changed to AvgIpc like this because it would break all rdkit_raw models.
…th RobustScaler and PowerTransformer. Updated documentation in related sections. Added functions to ModelFileReader to read out transformer specific parameters. Changed models that test RobustScaler and PowerTransformer to use RF to speed up the training
…default function for all sklearn parameters
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## 1.7.0 #397 +/- ##
==========================================
- Coverage 34.54% 29.95% -4.60%
==========================================
Files 50 50
Lines 13497 13564 +67
==========================================
- Hits 4663 4063 -600
- Misses 8834 9501 +667
... and 7 files with indirect coverage changes 🚀 New features to boost your workflow:
|
… it more generalizeable. Fixed tests. Fixed bug where the imputer_strategy parameter was not used
…e log scale features
…ndicator' flag because that changed the number of features and crashed.
…rming using SklearnPipelineWrapper
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is built off of PR396.
This contains new versions of rdkit and mordred features where certain features are scaled by the number of heavy atoms if they have a pearson correlation with the heavy atom count > 0.5. Power relationships between a feature and heavy atom count was considered, but no clear relationships were found.
Features based on VSA (van der Waals surface area) features are scaled, though they may not correlate due to binning. In rdkit, there are VSA features where the VSA is binned based on a property or vice versa. The contents of the bins may not correlate with the heavy atom count, but the sum of all bins highly correlates with heavy atom count.
Benefits of these new features are still being evaluated.