Skip to content

haiyuan-yu-lab/structural-interactome-features

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

structural-interactome-features

Toolkit to derive structural features for interacting proteins

The pipeline is ispired by PrePPI, but this project does not focus on predicting interactions, but rather compute useful structurally-informed features for proteins and pairs of proteins.

Installation instructions

Please install all the required python packages running:

pip install -r requirements.txt

External Requirements

  • AlphaFold
  • CD-HIT
  • CDD (we provide a script to perform the search, but you must install CDDs requirements and get the data)
  • ska (We provide bash scripts to perform the alignment, but we don't have permission to redistribute the software, you must obtain a licence to run this.)
  • PeSTo (We provide bash scripts to run PeSTo, assuming you have installed a proper Docker image in your system. You may modify this by configuring the environment variables in the .env file)

Configuration Variables

We rely on multiple external tools to run the entire pipeline. For example, it is recommended that you create a Docker container to run PeSTo and AlphaFold, and depending on your system it may make more sense to ivoke CD-HIT and ska also from their isolated Docker containers.

To allow for a composable pipeline, we rely on environment variables to control such cases:

  • AF_PREDICT: this should be configured to the command to run python docker/run_docker.py from the AlphaFold pipeline, we simply wrap our script API around this command call.
  • PESTO_PREDICT: Similar to the above, this is the command you'd need to run PeSTo's model. In the repository, this is unfortunately given as a jupyter notebook. You may use this fork to create your own Docker container, or overwrite the command yourself.
  • SKA_BIN: If present in your system, this should simply point to the ska binary (you should previously set ska's own environment variables). We recommend using a Docker container instead.
  • CDHIT_BIN: Similar to ska, this should point to the CD-HIT binary, or the docker run command to run. We simply take care of passing the arguments.
  • CDD_BIN: Similar to ska, this should point to the rpsblast binary, or the docker run command to run. We simply take care of passing the arguments.

For a full list of the required variables please look at the example .env file provided in this repository.

Pipeline steps

  1. download required datasets. If you've already downloaded the files, please create a configuration file before continuing.
  2. run CD-HIT to cluster PDB sequences at 60% identity, (both full sequences and domains)
  3. Build the blast database for CDD domains.
  4. Use rpsblast against CDD on the target sequences to identify domains on the queries
  5. Extract domain sequences from the BLAST results
  6. get AlphaFold models from AlphaFoldDB (or run AlphaFold for those that don't have an entry avaiable throug the API)
  7. generate the structural neighborhood
  8. calculate structural features

For more details please read the main.sh script provided in this repository

About

Toolkit to derive structural features for interacting proteins

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •