-
Full RNA support: The pipeline now fully supports RNA as a target (e.g., viral subgenomic RNA from SARS-CoV-2), including input models generated by simRNA or similar tools.
-
RNA-specific analyses:
-
Allosteric backbone exposure (Δexposure) proxy (“sasa proxy”)
-
Watson–Crick base pair persistence
-
Ligand–RNA atom contact tables
-
Carboxylate–OP1/OP2 binding analysis
-
π–π stacking (aromatic–aromatic) between ligand and RNA bases
-
-
UI-driven analysis: All trajectory analyses (protein and RNA) are now launched via an interactive UI panel in Colab. Just check the boxes for analyses you want to run!
-
Apo/holo comparison (“no-ligand” mode): Running molecular dynamics with or without ligand is now just a flag (
--no-ligand
) away. All overlay analyses (RMSD, RMSF, PCA, allostery, etc.) compare apo and holo runs. -
Flexible, transparent, and fast: All backend code is visible, runs in Colab, and outputs are auto-saved as images, CSVs, and PDBs for seamless downstream reporting and visualization.
A streamlined, Colab-optimized drug discovery pipeline integrating:
- ✅ Ligand-target prediction with a custom-trained DeepPurpose fork
- ✅ Structural docking using AutoDock Vina
- ✅ GPU-accelerated Molecular Dynamics with OpenMM and OpenFF
- ✅ RNA-enabled: analyze protein and RNA as targets, including viral subgenomic RNA (SARS-CoV-2 case studies)
- ✅ Mechanistic analyses: PCA, FEL, RMSD, H-bonds, water networks, allostery (Δexposure), π–π stacking, and more—now via an interactive Colab UI.
This repository demonstrates simulation and evaluation of ligand–protein/RNA interactions, with robust, user-guided exploratory analysis in Google Colab.
This pipeline is designed for use in Google Colab, with full support for condacolab
. A demo notebook is included in this repository to reproduce all steps.
All advanced trajectory analyses—including RNA-specific and allosteric functions—are performed via a UI-driven Colab panel (see below).
Paste the following at the very top of your Colab notebook:
!pip install -q condacolab
import condacolab
condacolab.install()
🔄 NOTE: This will crash your runtime once. That's expected.
After Colab restarts, rerun the following cell:
import condacolab
condacolab.check()
# Main pipeline repo (this one)
!git clone https://github.com/BioMolDynamics/DeepPurpose-MD-Discovery.git
# Custom fork of DeepPurpose (installed later)
!git clone https://github.com/BioMolDynamics/Deeppurpose
!wget https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.5/vina_1.2.5_linux_x86_64
!chmod +x vina_1.2.5_linux_x86_64
!./vina_1.2.5_linux_x86_64 --version
!mamba env create -f environment.yml
# Install custom fork of DeepPurpose without overwriting key dependencies
!conda run -n deeppurpose-md-env pip install --no-deps ./Deeppurpose
# Install optional dependencies like Open Babel
!conda run -n deeppurpose-md-env python scripts/install_optional.py
!conda run -n deeppurpose-md-env python scripts/1_prepare_ligand.py "$ligand_smiles"
!conda run -n deeppurpose-md-env python scripts/2_prepare_receptor.py "$pdb_id" --strict-protein # or --rna for RNA
!conda run -n deeppurpose-md-env python scripts/3_docking_vina.py --use-residue-centroid
!conda run -n deeppurpose-md-env python scripts/3b_prepare_protein.py # For proteins
# RNA users: follow README/Colab for RNA-specific prep before MD
!conda run -n deeppurpose-md-env python scripts/4_align_ligand.py
!conda run -n deeppurpose-md-env python scripts/5_md_simulation.py --protein/--rna [--no-ligand] # See notebook for options
!conda run -n deeppurpose-md-env python scripts/7_deeppurpose_training.py
!conda run -n deeppurpose-md-env python scripts/8_deeppurpose_prediction.py
Each script corresponds to a specific stage in the full drug discovery pipeline — from ligand design to MD simulation to deep learning prediction.
Script 5b (old analysis script) is deprecated. Script 6 is now performed directly via a UI analysis panel in Colab, using interactive checkboxes to launch all trajectory analyses (RMSD, PCA, H-bonds, RNA allostery, π–π stacking, etc.) with no additional scripts required.
All trajectory analysis is now performed via a UI panel in your Colab notebook:
-
Select and run only the analyses you need (checkbox UI).
-
All backend code is visible (for transparency/extensibility), but users only interact with the UI.
-
RNA-specialized analyses are available (Watson–Crick pairs, backbone Δexposure, ligand–RNA atom contacts, π–π stacking, and more).
-
Outputs are auto-saved as images, CSVs, and PDBs for downstream reporting or 3D visualization.
(See the demo Colab notebook for details and examples.)
This demo uses a COVID-19 specific subset of BindingDB, available from UC San Diego.
To simplify setup, we provide pre-cleaned versions of this dataset:
BindingDB_Covid-19.tsv
(214MB, hosted via Zenodo)strong_binders_cleaned.csv
(optional for filtering)protein.faa
(optional for filtering)metrics - SARS2 FASTA.csv
(matching data of SARS-CoV-2 proteins and FASTA)
📎 Dataset download link: [https://doi.org/10.5281/zenodo.15613825)
You are welcome to use your own SMILES/FASTA data by modifying 7_deeppurpose_training.py
.
-
Fully RNA-capable: Run all stages (including MD, contact analysis, allostery, π–π stacking) for RNA targets.
-
Interactive, modular trajectory analysis: UI lets you select any combination of analyses, including custom, RNA-specific ones.
-
Robust error handling: clear feedback if input files/definitions are missing.
-
All outputs are automatically saved (figures, CSV, PDB).
-
Colab-native: No installation required outside the notebook.
MIT License. Please cite this repository if used in academic work.
If you use DeepPurpose-MD in your work, please cite:
Mochizuki, I. (2025). DeepPurpose-MD: An End-to-End Colab-Based Drug Discovery Pipeline Integrating Docking, Molecular Dynamics, and Deep Learning. Zenodo. https://doi.org/10.5281/zenodo.15613825
Maintained by BioMolDynamics For academic inquiries, collaboration, or feedback, please open an issue.