Current code works on Linux/Mac OSx only, you need to modify file paths to work on Windows.
biopandas 0.2.8
biopython 1.80
Cython 0.29.32
deepspeed 0.5.9
dgl 0.9.1.post1
dgllife 0.3.0
joblib 1.1.0
pytorch-lightning 1.8.3.post1
PyTorch-metric 0.0.0
pytz 2022.6
PyYAML 6.0
rdkit 2022.9.2
requests 2.28.1
requests-oauthlib 1.3.1
rsa 4.9
scikit-learn 0.24.2
scipy 1.9.3
six 1.16.0
tensorboard 2.11.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorboardX 2.5.1
termcolor 2.1.1
threadpoolctl 3.1.0
torch 1.12.0
torchaudio 0.12.0
torchmetrics 0.10.3
torchvision 0.13.0
The datasets are all set up in the dataset directory. The data includes:
- AB-benchmark
- benchmark5.5
- FD1.0 based on DIPS
The OneDrive shared link is: (As you can see in upload datasets link.)
The raw pdb files of the DB5.5 dataset are in the directory ./dataset/XXX/structures
.
Set the Path and the dataset
args['cache_path'] = 'dataset/cache/unbound_ab'
args['data'] = 'ab'
Then preprocess the raw data as follows to prepare data for rigid body docking:
# Prepare data for FlexPPD
python pre_process.py
By default, pre_process.py
uses 10 neighbors for each node when constructing
the graph and uses only residues (coordinates being those of the alpha carbons). After running pre_process.py
, you will get the following
ready-for-training data directory:
FlexPPD/dataset/cache/unbound_XXX
with files:
$ ls FlexPPD/dataset/cache/unbound_XXX
label_test.pkl
label_val.pkl
ligand_graph_train.bin
receptor_graph_test.bin
receptor_graph_val.bin
label_train.pkl
ligand_graph_test.bin
ligand_graph_val.bin
receptor_graph_train.bin
The FD1.0 is based on DIPS.
On GPU (works also on CPU, but it's very slow):
CUDA_VISIBLE_DEVICES=0 python -m src.train -hyper_search
or just specify your own params if you don't want to do a hyperparameter search. This will create checkpoints, tensorboard logs (you can visualize with tensorboard), and will store all stdout/stderr in a log file. This will train a model on DIPS first and, then, fine-tune it on DB5. Use -toy
to train on DB5 only.
In our paper, we used the train/validation/test splits given by the files:
DIPS: /FlexPPD/dataset/DIPS/cv/*.txt
DB5: /FlexPPD/dataset/DB5/cv/*.txt
See main.py
.
Our paper's pretrained models are available in the checkpts/
folder.
To train the model, for example on DB5.5:
- Set the path of the dataset and the path to save the checkpoint:
args['data'] = 'db5' args['cache_path'] = 'dataset/cache/unbound_db5'
- Set the main function:
if __name__ == '__main__': train(args)
- To train the model:
python main.py
To test the model, for example on DB5.5:
- Set the path of the dataset and the path to save the checkpoint:
args['data'] = 'db5' args['cache_path'] = 'dataset/cache/unbound_db5'
- Set the main function:
if __name__ == '__main__': test(args)
- Set the checkpoint filename for the best model:
args['checkpoint_filename'] = 'checkpts/db5/db5_model_best.pth'
- To test the model:
python main.py