Skip to content

rnepal2/Solubility-Prediction-with-Graph-Neural-Networks

Repository files navigation

Prediction of Aqueous Solubility of Drug Molecules

Overview

Predicting the aqueous solubility of small organic molecules is a critical step in early-stage drug discovery, influencing a compound's absorption, distribution, metabolism, and excretion (ADME) properties. This project explores and compares two primary computational approaches for classifying molecular solubility:

  1. Descriptor-based Quantitative Structure-Property Relationship (QSPR) Modeling: Utilizes molecular descriptors generated by RDKit with a LightGBM classifier.
  2. Graph Convolutional Neural Networks (GCNs): Employs graph representations of molecules with PyTorch Geometric to learn predictive features.

Data Source

Model Highlights

Example Molecules

Example molecules from the dataset.

Hyperparameter Tuning

Visualization from GCN hyperparameter tuning.

Dependencies

  • PyTorch
  • PyTorch-Geometric
  • RDKit
  • DeepChem
  • Scikit-Learn
  • LightGBM
  • Pandas
  • NumPy
  • Seaborn
  • Matplotlib
  • tqdm
  • Wandb (for hyperparameters search.ipynb)

Repository Structure

  • analytics.ipynb: Data exploration, feature analysis, and visualization.
  • descriptor-based-model.ipynb: Implementation and evaluation of the QSPR model using molecular descriptors and LightGBM.
  • hyperparameters search.ipynb: Hyperparameter optimization for the GCN model using Weights & Biases.
  • model training and evaluation.ipynb: Training, evaluation, and analysis of the GCN model.
  • custom_dataset.py: Defines the custom PyTorch Geometric dataset for loading molecular graphs.
  • model.py: Contains the GCN model architecture definition.
  • trainer.py: Implements the training and validation loop for the GCN model.
  • utils.py: Utility functions used across notebooks (e.g., for evaluation metrics, plotting).
  • data/: Directory containing the dataset files.

Read a Summary

Want to read a more readable version of this project? You can find it here: Prediction of Aqueous Solubility

About

GNN, GCN, Molecular Solubility, RDKit, Cheminformatics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published