🆕 Updates – 1 August 2025

Full RNA support: The pipeline now fully supports RNA as a target (e.g., viral subgenomic RNA from SARS-CoV-2), including input models generated by simRNA or similar tools.
RNA-specific analyses:
- Allosteric backbone exposure (Δexposure) proxy (“sasa proxy”)
- Watson–Crick base pair persistence
- Ligand–RNA atom contact tables
- Carboxylate–OP1/OP2 binding analysis
- π–π stacking (aromatic–aromatic) between ligand and RNA bases
UI-driven analysis: All trajectory analyses (protein and RNA) are now launched via an interactive UI panel in Colab. Just check the boxes for analyses you want to run!
Apo/holo comparison (“no-ligand” mode): Running molecular dynamics with or without ligand is now just a flag (--no-ligand) away. All overlay analyses (RMSD, RMSF, PCA, allostery, etc.) compare apo and holo runs.
Flexible, transparent, and fast: All backend code is visible, runs in Colab, and outputs are auto-saved as images, CSVs, and PDBs for seamless downstream reporting and visualization.

🧬 DeepPurpose-MD-Discovery

A streamlined, Colab-optimized drug discovery pipeline integrating:

✅ Ligand-target prediction with a custom-trained DeepPurpose fork
✅ Structural docking using AutoDock Vina
✅ GPU-accelerated Molecular Dynamics with OpenMM and OpenFF
✅ RNA-enabled: analyze protein and RNA as targets, including viral subgenomic RNA (SARS-CoV-2 case studies)
✅ Mechanistic analyses: PCA, FEL, RMSD, H-bonds, water networks, allostery (Δexposure), π–π stacking, and more—now via an interactive Colab UI.

This repository demonstrates simulation and evaluation of ligand–protein/RNA interactions, with robust, user-guided exploratory analysis in Google Colab.

🔧 Setup Instructions (Google Colab)

This pipeline is designed for use in Google Colab, with full support for condacolab. A demo notebook is included in this repository to reproduce all steps. All advanced trajectory analyses—including RNA-specific and allosteric functions—are performed via a UI-driven Colab panel (see below).

✅ Step 1: Enable Conda in Colab

Paste the following at the very top of your Colab notebook:

!pip install -q condacolab
import condacolab
condacolab.install()

🔄 NOTE: This will crash your runtime once. That's expected.

After Colab restarts, rerun the following cell:

import condacolab
condacolab.check()

✅ Step 2: Clone Required Repositories

# Main pipeline repo (this one)
!git clone https://github.com/BioMolDynamics/DeepPurpose-MD-Discovery.git

# Custom fork of DeepPurpose (installed later)
!git clone https://github.com/BioMolDynamics/Deeppurpose

✅ Step 3: Download AutoDock Vina Binary

!wget https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.5/vina_1.2.5_linux_x86_64
!chmod +x vina_1.2.5_linux_x86_64
!./vina_1.2.5_linux_x86_64 --version

✅ Step 4: Install Conda Environment

!mamba env create -f environment.yml

✅ Step 5: Finalize Setup

# Install custom fork of DeepPurpose without overwriting key dependencies
!conda run -n deeppurpose-md-env pip install --no-deps ./Deeppurpose

# Install optional dependencies like Open Babel
!conda run -n deeppurpose-md-env python scripts/install_optional.py

✅ Step 6: Run the Pipeline

!conda run -n deeppurpose-md-env python scripts/1_prepare_ligand.py "$ligand_smiles"
!conda run -n deeppurpose-md-env python scripts/2_prepare_receptor.py "$pdb_id" --strict-protein  # or --rna for RNA
!conda run -n deeppurpose-md-env python scripts/3_docking_vina.py --use-residue-centroid
!conda run -n deeppurpose-md-env python scripts/3b_prepare_protein.py  # For proteins
# RNA users: follow README/Colab for RNA-specific prep before MD
!conda run -n deeppurpose-md-env python scripts/4_align_ligand.py
!conda run -n deeppurpose-md-env python scripts/5_md_simulation.py --protein/--rna [--no-ligand]  # See notebook for options
!conda run -n deeppurpose-md-env python scripts/7_deeppurpose_training.py
!conda run -n deeppurpose-md-env python scripts/8_deeppurpose_prediction.py

Each script corresponds to a specific stage in the full drug discovery pipeline — from ligand design to MD simulation to deep learning prediction.

Script 5b (old analysis script) is deprecated. Script 6 is now performed directly via a UI analysis panel in Colab, using interactive checkboxes to launch all trajectory analyses (RMSD, PCA, H-bonds, RNA allostery, π–π stacking, etc.) with no additional scripts required.

🖥️ Interactive Analysis: UI-Driven in Colab

All trajectory analysis is now performed via a UI panel in your Colab notebook:

Select and run only the analyses you need (checkbox UI).
All backend code is visible (for transparency/extensibility), but users only interact with the UI.
RNA-specialized analyses are available (Watson–Crick pairs, backbone Δexposure, ligand–RNA atom contacts, π–π stacking, and more).
Outputs are auto-saved as images, CSVs, and PDBs for downstream reporting or 3D visualization.

(See the demo Colab notebook for details and examples.)

🧪 Dataset Information

This demo uses a COVID-19 specific subset of BindingDB, available from UC San Diego.
To simplify setup, we provide pre-cleaned versions of this dataset:

BindingDB_Covid-19.tsv (214MB, hosted via Zenodo)
strong_binders_cleaned.csv (optional for filtering)
protein.faa (optional for filtering)
metrics - SARS2 FASTA.csv (matching data of SARS-CoV-2 proteins and FASTA)

📎 Dataset download link: [https://doi.org/10.5281/zenodo.15613825)

You are welcome to use your own SMILES/FASTA data by modifying 7_deeppurpose_training.py.

🧬 Features

Fully RNA-capable: Run all stages (including MD, contact analysis, allostery, π–π stacking) for RNA targets.
Interactive, modular trajectory analysis: UI lets you select any combination of analyses, including custom, RNA-specific ones.
Robust error handling: clear feedback if input files/definitions are missing.
All outputs are automatically saved (figures, CSV, PDB).
Colab-native: No installation required outside the notebook.

📜 License

MIT License. Please cite this repository if used in academic work.

📖 Citation

If you use DeepPurpose-MD in your work, please cite:

Mochizuki, I. (2025). DeepPurpose-MD: An End-to-End Colab-Based Drug Discovery Pipeline Integrating Docking, Molecular Dynamics, and Deep Learning. Zenodo. https://doi.org/10.5281/zenodo.15613825

📫 Contact

Maintained by BioMolDynamics For academic inquiries, collaboration, or feedback, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
scripts		scripts
LICENSE		LICENSE
Pipeline_Demo.ipynb		Pipeline_Demo.ipynb
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🆕 Updates – 1 August 2025

🧬 DeepPurpose-MD-Discovery

🔧 Setup Instructions (Google Colab)

✅ Step 1: Enable Conda in Colab

✅ Step 2: Clone Required Repositories

✅ Step 3: Download AutoDock Vina Binary

✅ Step 4: Install Conda Environment

✅ Step 5: Finalize Setup

✅ Step 6: Run the Pipeline

🖥️ Interactive Analysis: UI-Driven in Colab

🧪 Dataset Information

🧬 Features

📜 License

📖 Citation

📫 Contact

About

Uh oh!

Releases

Packages

Languages

License

BioMolDynamics/DeepPurpose-MD-Discovery

Folders and files

Latest commit

History

Repository files navigation

🆕 Updates – 1 August 2025

🧬 DeepPurpose-MD-Discovery

🔧 Setup Instructions (Google Colab)

✅ Step 1: Enable Conda in Colab

✅ Step 2: Clone Required Repositories

✅ Step 3: Download AutoDock Vina Binary

✅ Step 4: Install Conda Environment

✅ Step 5: Finalize Setup

✅ Step 6: Run the Pipeline

🖥️ Interactive Analysis: UI-Driven in Colab

🧪 Dataset Information

🧬 Features

📜 License

📖 Citation

📫 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages