****************************************************************************
* ██████╗ ██╗ ██╗██╗ ██╗███████╗██████╗ ██████╗ ██████╗██╗ ██╗ *
* ██╔══██╗██║ ██║╚██╗ ██╔╝██╔════╝██╔══██╗██╔═══██╗██╔════╝██║ ██╔╝ *
* ██████╔╝███████║ ╚████╔╝ ███████╗██║ ██║██║ ██║██║ █████╔╝ *
* ██╔═══╝ ██╔══██║ ╚██╔╝ ╚════██║██║ ██║██║ ██║██║ ██╔═██╗ *
* ██║ ██║ ██║ ██║ ███████║██████╔╝╚██████╔╝╚██████╗██║ ██╗ *
* ╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚══════╝╚═════╝ ╚═════╝ ╚═════╝╚═╝ ╚═╝ *
****************************************************************************
Implemented by KexinZhang (zhangkx2022@shanghaitech.edu.cn) & JialeYu (yujl2022@shanghaitech.edu.cn).
PhysDock is a non-equivariant physics-guided all-atom denoising diffusion model designed for predicting flexible protein-ligand complex structures. The architechture is inspired by AlphaFold3, Llama3 and StableDiffusion3.
PhysDock OverviewCurrently, PhysDock supports multi protein chains and only one small molecule ligand input.
- PhysDock needs to search the MSAs for the protein sequence of the receptor. Therefore, the corresponding tools need to be installed and five databases must be prepared.
# Install Bio Tools
apt install hhsuite hmmer
# Download Databases
sh scripts/download_homo_datasets.sh <target_databases> # For example, $HOME/libs
- We provide a conda
.yaml
file to create environment to run PhysDock
# Create ENV
conda env create -f enviroment.yaml
# ENV Activation
conda activate PhysDock
- Download model params from zenedo and move it to
params
dir.
# Download latest EMA Params
cd params
wget https://zenodo.org/records/15178859/files/params.pt
PhysDock provides convenient inference scripts for two scenarios: redocking and virtual screening (VS). Prior to redocking and VS, it is necessary to conduct corresponding preprocessing on a system. Subsequently, the cached files obtained from preprocessing are used for complex structure prediction and ranking.
In PhysDock, a system encompasses one or more receptor chains along with a ligand. The input is a .pdb
file of receptor and a .sdf
file of ligand, while the output consists of the raw features (.pkl.gz
file) for model input, typically structural-related features. In addition to structural features, PhysDock generates MSA-related features, including msa_feature
and uniprot_msa_feature
, by searching all sequence databases. We utilize the MD5 encoding of the protein sequence as a query to reuse the MSA features. Moreover, the raw feature can also encompass key residues involved in interactions processed by PLIP, and specify high-importance key residues with a certain probability during inference.
Below is an example of system preparation of demo input system.
BASE=$(dirname $0)
# Generate systems pkl.gz
# The receptor pdb and ligand sdf and ligand CCD id are inputs.
# The output is a system `pkl.gz`
python $BASE/prepare_system.py \
--receptor_pdb_path $BASE/demo/system_preparation/receptor.pdb \
--ligand_sdf_path $BASE/demo/system_preparation/EJQ.sdf \
--ligand_ccd_id EJQ \
--systems_dir $BASE/demo/system_preparation/systems
# Get MSA features
python $BASE/run_homo_search.py \
--input_fasta_path $BASE/demo/system_preparation/systems/fastas \
--features_dir $BASE/demo/system_preparation/features \
--bfd_database_path <target_databases>/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path <target_databases>/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--uniref90_database_path <target_databases>/uniref90.fasta \
--mgnify_database_path <target_databases>/mgy_clusters.fa \
--uniprot_database_path <target_databases>/uniprot.fasta \
--jackhmmer_binary_path /usr/bin/jackhmmer \
--hhblits_binary_path /usr/bin/hhblits
The redocking procedure can refer to the following script redocking_demo.sh
.
# Run Demo
sh redocking_demo.sh
# redocking_demo.sh
BASE=$(dirname $0)
python $BASE/redocking.py \
-i $BASE/examples/demo/Posebusters_subset \
-f $BASE/examples/demo/features \
--crop_size 256 \
--atom_crop_size 2048 \
--enable_physics_correction \
--use_pocket \
--use_key_res \
--enable_ranking
# Get Help Info
python redocking.py -h
The VS procedure can refer to the following script screning.sh
. Here, it should be noted that when conducting screening for a specified pocket, the position of the ligand in the system .pkl.gz
file (from System Preparation procedure) will serve as the search area for the pocket.
# Run Demo
sh screening_demo.sh
# Run full pipeline
sh docking_demo.sh
Other application scenarios, such as blind docking, cross docking or standard precision (SP) flexiable docking can be implemented by specifying different parameters of python scripts.
The training dataset, the preprocessed benchmark dataset, and the prediction results presented in the paper can be obtained through the following scripts from zenodo.
- PhysDock acheives SOTA redocking results in three benchmarks.
- The preprocessed benchmark dataset can be download through following script.
# Download all the preprocessed benchmark dataset including Posebusters, DeepDockingDare and PhiBench.
sh scripts/download_benchmarks.sh
- The training and validation dataset can be download by the following scripts
sh scripts/download_dataset.sh
To be continued.