Skip to content

End-to-end pipeline for Colab-based drug discovery integrating DeepPurpose, AutoDock Vina, and OpenMM. Supports DTI, docking, MD, and FEL analysis — no HPC cluster needed.

License

Notifications You must be signed in to change notification settings

BioMolDynamics/DeepPurpose-MD-Discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🆕 Updates – 1 August 2025

  • Full RNA support: The pipeline now fully supports RNA as a target (e.g., viral subgenomic RNA from SARS-CoV-2), including input models generated by simRNA or similar tools.

  • RNA-specific analyses:

    • Allosteric backbone exposure (Δexposure) proxy (“sasa proxy”)

    • Watson–Crick base pair persistence

    • Ligand–RNA atom contact tables

    • Carboxylate–OP1/OP2 binding analysis

    • π–π stacking (aromatic–aromatic) between ligand and RNA bases

  • UI-driven analysis: All trajectory analyses (protein and RNA) are now launched via an interactive UI panel in Colab. Just check the boxes for analyses you want to run!

  • Apo/holo comparison (“no-ligand” mode): Running molecular dynamics with or without ligand is now just a flag (--no-ligand) away. All overlay analyses (RMSD, RMSF, PCA, allostery, etc.) compare apo and holo runs.

  • Flexible, transparent, and fast: All backend code is visible, runs in Colab, and outputs are auto-saved as images, CSVs, and PDBs for seamless downstream reporting and visualization.

🧬 DeepPurpose-MD-Discovery

A streamlined, Colab-optimized drug discovery pipeline integrating:

  • ✅ Ligand-target prediction with a custom-trained DeepPurpose fork
  • ✅ Structural docking using AutoDock Vina
  • ✅ GPU-accelerated Molecular Dynamics with OpenMM and OpenFF
  • RNA-enabled: analyze protein and RNA as targets, including viral subgenomic RNA (SARS-CoV-2 case studies)
  • ✅ Mechanistic analyses: PCA, FEL, RMSD, H-bonds, water networks, allostery (Δexposure), π–π stacking, and more—now via an interactive Colab UI.

This repository demonstrates simulation and evaluation of ligand–protein/RNA interactions, with robust, user-guided exploratory analysis in Google Colab.


🔧 Setup Instructions (Google Colab)

This pipeline is designed for use in Google Colab, with full support for condacolab. A demo notebook is included in this repository to reproduce all steps. All advanced trajectory analyses—including RNA-specific and allosteric functions—are performed via a UI-driven Colab panel (see below).


✅ Step 1: Enable Conda in Colab

Paste the following at the very top of your Colab notebook:

!pip install -q condacolab
import condacolab
condacolab.install()

🔄 NOTE: This will crash your runtime once. That's expected.

After Colab restarts, rerun the following cell:

import condacolab
condacolab.check()

✅ Step 2: Clone Required Repositories

# Main pipeline repo (this one)
!git clone https://github.com/BioMolDynamics/DeepPurpose-MD-Discovery.git

# Custom fork of DeepPurpose (installed later)
!git clone https://github.com/BioMolDynamics/Deeppurpose

✅ Step 3: Download AutoDock Vina Binary

!wget https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.5/vina_1.2.5_linux_x86_64
!chmod +x vina_1.2.5_linux_x86_64
!./vina_1.2.5_linux_x86_64 --version

✅ Step 4: Install Conda Environment

!mamba env create -f environment.yml

✅ Step 5: Finalize Setup

# Install custom fork of DeepPurpose without overwriting key dependencies
!conda run -n deeppurpose-md-env pip install --no-deps ./Deeppurpose

# Install optional dependencies like Open Babel
!conda run -n deeppurpose-md-env python scripts/install_optional.py

✅ Step 6: Run the Pipeline

!conda run -n deeppurpose-md-env python scripts/1_prepare_ligand.py "$ligand_smiles"
!conda run -n deeppurpose-md-env python scripts/2_prepare_receptor.py "$pdb_id" --strict-protein  # or --rna for RNA
!conda run -n deeppurpose-md-env python scripts/3_docking_vina.py --use-residue-centroid
!conda run -n deeppurpose-md-env python scripts/3b_prepare_protein.py  # For proteins
# RNA users: follow README/Colab for RNA-specific prep before MD
!conda run -n deeppurpose-md-env python scripts/4_align_ligand.py
!conda run -n deeppurpose-md-env python scripts/5_md_simulation.py --protein/--rna [--no-ligand]  # See notebook for options
!conda run -n deeppurpose-md-env python scripts/7_deeppurpose_training.py
!conda run -n deeppurpose-md-env python scripts/8_deeppurpose_prediction.py

Each script corresponds to a specific stage in the full drug discovery pipeline — from ligand design to MD simulation to deep learning prediction.

Script 5b (old analysis script) is deprecated. Script 6 is now performed directly via a UI analysis panel in Colab, using interactive checkboxes to launch all trajectory analyses (RMSD, PCA, H-bonds, RNA allostery, π–π stacking, etc.) with no additional scripts required.

🖥️ Interactive Analysis: UI-Driven in Colab

All trajectory analysis is now performed via a UI panel in your Colab notebook:

  • Select and run only the analyses you need (checkbox UI).

  • All backend code is visible (for transparency/extensibility), but users only interact with the UI.

  • RNA-specialized analyses are available (Watson–Crick pairs, backbone Δexposure, ligand–RNA atom contacts, π–π stacking, and more).

  • Outputs are auto-saved as images, CSVs, and PDBs for downstream reporting or 3D visualization.

(See the demo Colab notebook for details and examples.)

🧪 Dataset Information

This demo uses a COVID-19 specific subset of BindingDB, available from UC San Diego.
To simplify setup, we provide pre-cleaned versions of this dataset:

  • BindingDB_Covid-19.tsv (214MB, hosted via Zenodo)
  • strong_binders_cleaned.csv (optional for filtering)
  • protein.faa (optional for filtering)
  • metrics - SARS2 FASTA.csv (matching data of SARS-CoV-2 proteins and FASTA)

📎 Dataset download link: [https://doi.org/10.5281/zenodo.15613825)

You are welcome to use your own SMILES/FASTA data by modifying 7_deeppurpose_training.py.

🧬 Features

  • Fully RNA-capable: Run all stages (including MD, contact analysis, allostery, π–π stacking) for RNA targets.

  • Interactive, modular trajectory analysis: UI lets you select any combination of analyses, including custom, RNA-specific ones.

  • Robust error handling: clear feedback if input files/definitions are missing.

  • All outputs are automatically saved (figures, CSV, PDB).

  • Colab-native: No installation required outside the notebook.

📜 License

MIT License. Please cite this repository if used in academic work.

📖 Citation

If you use DeepPurpose-MD in your work, please cite:

Mochizuki, I. (2025). DeepPurpose-MD: An End-to-End Colab-Based Drug Discovery Pipeline Integrating Docking, Molecular Dynamics, and Deep Learning. Zenodo. https://doi.org/10.5281/zenodo.15613825

📫 Contact

Maintained by BioMolDynamics For academic inquiries, collaboration, or feedback, please open an issue.

About

End-to-end pipeline for Colab-based drug discovery integrating DeepPurpose, AutoDock Vina, and OpenMM. Supports DTI, docking, MD, and FEL analysis — no HPC cluster needed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published