-
Notifications
You must be signed in to change notification settings - Fork 67
Check for ECFP collisions #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…will receive a warning when that happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- do you need to featurize the smiles or can you call this after the features have been calculated for the model/etc?
- Might be worth adding a recommendation to the warning like, "we recommend trying a larger ECFP radius to reduce collisions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Amanda, it would be helpful to suggest increasing the ECFP radius in the warning message. Otherwise the code looks fine.
…caped '\C' in smiles strings. Added recommendation to warning message to increase radius or size to help reduce collisions
I have two followup questions:
|
When would this need to happen? I think in general, no. In specific pandas operations (str.replace for instance) you can specify regex=False but I don't think any of that applies here. We would be getting errors and failures all the time if this was needed right? |
When you specify a SMILES string as a string literal (e.g., |
Sometimes different smiles result in the same ECFP features. Now you will receive a warning when that happens