We recommend creating a dedicated conda environment:
conda create -n PreDDG python=3.12
conda activate PreDDG
pip install numpy pandas scipy scikit-learn pathlib tqdm
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu118
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.5.0+cu118.html
pip install tensorboard tensorboardX pytorch_lightning
pip install torch_geometric fair-esm
pip install biopython
Datasets should be placed under ./data/dataset/
directory. The folder structure should be as follows:
data/
dataset/
M28/
mutations/
M28.csv
Download ISM-650M-UC30PDBοΌand place it in ./data/ism/ism_t33_650M_uc30pdb/
directory:
data/
ism/
ism_t33_650M_uc30pdb/
config.json
gitattributes
ism_t33_650M_uc30pdb.pth
model.safetensors
special_tokens_map.json
tokenizer_config.json
vocab.txt
Example: predicting on M28 dataset. Input files should be in .csv format with one of the following formats:
Format 1 (with both wild-type and mutant sequences):
pdb_id | wt_seq | mut_info | mut_seq |
---|
Format 2 (only mutation info provided):
pdb_id | wt_seq | mut_info |
---|
Note:
mut_info
follows the formatWT_POS_MUT
, e.g.,Y68R
means the 68th position changes from Y to R.- Multiple mutations are separated by
:
, e.g.,Y68R:A120V
. mut_seq
is optional. If not provided, it will be computed based onwt_seq
andmut_info
.
cd PreDDG
python predict.py --test_name='M28' --device='cuda:0'
Predictions are saved under ./data/dataset/M28/predictions/
. Example output:
pdb_id | wt_seq | mut_info | mut_seq | preddg |
---|
For more details, please refer to the paper and source code.
If you find PreDDG useful, please cite our paper:
@article{
title={},
author={},
journal={},
year={}
}