Skip to content

HICAI-ZJU/FlexPPD_code

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlexPPD: Flexible Protein-Protein Docking via Learning Multi-level Structures

Dependencies

Current code works on Linux/Mac OSx only, you need to modify file paths to work on Windows.

biopandas 0.2.8
biopython 1.80
Cython 0.29.32
deepspeed 0.5.9
dgl 0.9.1.post1
dgllife 0.3.0
joblib 1.1.0
pytorch-lightning 1.8.3.post1
PyTorch-metric 0.0.0
pytz 2022.6
PyYAML 6.0
rdkit 2022.9.2
requests 2.28.1
requests-oauthlib 1.3.1
rsa 4.9
scikit-learn 0.24.2
scipy 1.9.3
six 1.16.0
tensorboard 2.11.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorboardX 2.5.1
termcolor 2.1.1
threadpoolctl 3.1.0
torch 1.12.0
torchaudio 0.12.0
torchmetrics 0.10.3
torchvision 0.13.0

Dataset

The datasets are all set up in the dataset directory. The data includes:

  • AB-benchmark
  • benchmark5.5
  • FD1.0 based on DIPS

The OneDrive shared link is: (As you can see in upload datasets link.)

The raw pdb files of the DB5.5 dataset are in the directory ./dataset/XXX/structures. Set the Path and the dataset

    args['cache_path'] = 'dataset/cache/unbound_ab'

    args['data'] = 'ab'

Then preprocess the raw data as follows to prepare data for rigid body docking:

# Prepare data for FlexPPD
python pre_process.py

By default, pre_process.py uses 10 neighbors for each node when constructing the graph and uses only residues (coordinates being those of the alpha carbons). After running pre_process.py, you will get the following ready-for-training data directory:

FlexPPD/dataset/cache/unbound_XXX

with files:

$ ls FlexPPD/dataset/cache/unbound_XXX
label_test.pkl 
label_val.pkl 
ligand_graph_train.bin 
receptor_graph_test.bin 
receptor_graph_val.bin
label_train.pkl 
ligand_graph_test.bin 
ligand_graph_val.bin 
receptor_graph_train.bin

FD1.0 Data

The FD1.0 is based on DIPS.

Training

On GPU (works also on CPU, but it's very slow):

CUDA_VISIBLE_DEVICES=0 python -m src.train -hyper_search

or just specify your own params if you don't want to do a hyperparameter search. This will create checkpoints, tensorboard logs (you can visualize with tensorboard), and will store all stdout/stderr in a log file. This will train a model on DIPS first and, then, fine-tune it on DB5. Use -toy to train on DB5 only.

Data Splits

In our paper, we used the train/validation/test splits given by the files:

DIPS: /FlexPPD/dataset/DIPS/cv/*.txt
DB5: /FlexPPD/dataset/DB5/cv/*.txt

Training Models

See main.py.

Pretrained Models

Our paper's pretrained models are available in the checkpts/ folder.

Training Models

To train the model, for example on DB5.5:

  1. Set the path of the dataset and the path to save the checkpoint:
    args['data'] = 'db5'
    args['cache_path'] = 'dataset/cache/unbound_db5'
  2. Set the main function:
    if __name__ == '__main__':
        train(args)
  3. To train the model:
    python main.py
    

Testing Models

To test the model, for example on DB5.5:

  1. Set the path of the dataset and the path to save the checkpoint:
    args['data'] = 'db5'
    args['cache_path'] = 'dataset/cache/unbound_db5'
  2. Set the main function:
    if __name__ == '__main__':
        test(args)
  3. Set the checkpoint filename for the best model:
    args['checkpoint_filename'] = 'checkpts/db5/db5_model_best.pth'
  4. To test the model:
    python main.py
    

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Roff 58.2%
  • Python 41.2%
  • Cython 0.6%