Link to Paper

Setup environment and run models

Running a docker image

We recommend using the scripts by pulling the publicly available docker image:

docker pull vcanogil/covalent-classifier:latest

Installing locally

Using the python environment manager of your choice (Python 3.10 is recommended):

# Example using conda env manager
conda create -n cov-classifer python=3.10
conda activate cov-classifer
pip install -r requirements.txt

Running the training script

If you want to simply to train a model, run either from the from the main directory:

python models/graph/train.py

or

python models/fingerprint/train.py

If you want to specify which particular fingerprint model to run, specify that in the script source itself.

Loading the model and making a prediction

To load one of the graph models from the saved_models folder, run the following:

python models/graph/make_prediction.py Your_Smiles_Or_InChI_String

Note that if you want to make predictions in batch, it is recommended to alter the script as the original function loads the model every time it makes a prediction.

You can also compute averrage structure Tanimoto similarity and pairwise distance to the training set by passing a True flag:

python models/graph/make_prediction.py Your_Smiles_Or_InChI_String True

Note that this will take longer than just predicting the label.

Generating a GradCAM heatmap

You can generate a class activation heatmap by running the following command from the main directory:

python models/graph/gradcam.py Your_Smiles_Or_InChI_String

This will make a file named gradcam_heatmap.png. If you want to adapt this to your needs, refer to models/graph/gradcam.py file itself.

By default, GCNII model is used. You can modifiy which model to use to produce the heatmap by modifying the argument in make_gradcam_heatmap of models/graph/gradcam.py module, or by training your own model.

How to Cite

Researchers using these methods or the ProteinReactiveDB should cite

Cano Gil, VH; Rowley C.N. Digital Discovery, 2024, DOI: https://doi.org/10.1039/D4DD00038B

Troubleshooting

If trying to load the models for your own work, you may encounter an error:

ValueError: No TypeSpec has been registered with name 'molgraph.tensors.graph_tensor.GraphTensorSpec'

running import molgraph in your current code should remediate that.

ProteinReactiveDB

Dataset built for this project can be found under data/InChI_all/training_data_all.csv. The data folder also includes structures excluded from various public databases used to build ProteinReactiveDB, as well as the test data grouped by type of structure.

Reference

Cano Gil, V. H.; Rowley, C. ChemRxiv 2023. doi: 10.26434/chemrxiv-2023-d0dqp

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
data		data
eli_lilly		eli_lilly
images		images
models		models
saved_models		saved_models
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Link to Paper

Setup environment and run models

Running a docker image

Installing locally

Running the training script

Loading the model and making a prediction

Generating a GradCAM heatmap

How to Cite

Troubleshooting

ProteinReactiveDB

Reference

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

RowleyGroup/covalent-classifier

Folders and files

Latest commit

History

Repository files navigation

Link to Paper

Setup environment and run models

Running a docker image

Installing locally

Running the training script

Loading the model and making a prediction

Generating a GradCAM heatmap

How to Cite

Troubleshooting

ProteinReactiveDB

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages