Skip to content

BeckResearchLab/Solvation

 
 

Repository files navigation

GitHub License

Python

Solvation Meta Predictor

This repository contains code for predicting the aqueous solubility of organic molecules using machine learning models. The models and dataset are based on the research paper: Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations.

Usage

  1. Pull Original Code
  • Pull the pnnlsolpaper folder from the original repository:
# pull the original PNNL codebase
git submodule init
git submodule update
  • Then apply the patch set:
bash apply_patches.bash
  1. Download Data: Download the dataset file named dataset.csv from this link and save it as data.csv in the ./data folder.

  2. Generate Features:

    • Generate Pybel coordinates and Molecular Dynamics (MDM) features by running create_data.py in the ./data folder:
      cd ./pnnlsolpaper/data
      python create_data.py
    • Then return to the root folder
      cd ../..
  3. Train Models:

    • To train the MDM model, run pnnlsolpaper/mdm/train.py as a package (command written assuming the root directory):
      python -m pnnlsolpaper.mdm.train
    • To train the GNN model, run pnnlsolpaper/gnn/train.py:
      python -m pnnlsolpaper.gnn.train
    • To train the SMI model, run pnnlsolpaper/smi/train.py:
      python -m pnnlsolpaper.smi.train
  4. Make Predictions:
    (NOTE: this step is optional)

    • Use the predict.ipynb files in each model's folder to make predictions (note: this step is optional):
      cd pnnlsolpaper/mdm/
      jupyter notebook predict.ipynb
      Repeat the above steps for the gnn and smi folders.
    • Afterwards return to the root directory:
      cd ../..
  5. Ensemble Models:

    • To ensemble the models, run the following scripts from the ensemble folder:
      cd ensemble/
      python CV.py
      python Optuna.py
      python KNN.py
  6. Compare Predictions:

    • To compare predictions from individual models with ensemble methods, use the ensemble_prediction.ipynb notebook:
      jupyter notebook ensemble_prediction.ipynb

Solvation Meta Predictor Perfomacne

Solvation Meta Predictor Perfomacne

Additional Information

For detailed instructions on how to run the models, featurize the data, and other specifics, please refer to the original research paper linked above. The methods and techniques described in the paper are critical for understanding and effectively using this repository.

About

Fork of the ensemble modeling architecture for solvation prediciton

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages

  • Jupyter Notebook 99.2%
  • Other 0.8%