GitHub - haiyuan-yu-lab/ECLAIR

Input Format

The pipeline only requires a tab-delimited text file with the pairs of interacting proteins that will be used to predict interfaces.

Q6VAB6	Q02750
Q6VAB6	P28482
Q6VAB6	P15056
Q6VAB6	P54646

Output Format

Once the pipeline is done running, a CSV file will be generated per interaction in the chosen output folder.

Column	Contents
P1	Protein Partner 1 (Alphabetical)
P2	Protein Partner 2 (Alphabetical)
Prot	Indicated whether this row refers to the first (0) or the second (1) protein partner
Pos	1-indexed Amino Acid Position that this row refers to
Res	Expected Amino Acid at this position
TopClf	Which of the 8 ECLAIR classifiers was used for this prediction
Pred	Probability of this being an interface residue
Interface Prediction	Binned probability of this being an interface residue (Very Low / Low / Medium / High / Very High)

P1,P2,Prot,Pos,Res,TopClf,Pred,Interface Prediction
P15056,Q6VAB6,0,1,M,0,0.0,Very Low
P15056,Q6VAB6,0,2,A,0,0.0,Very Low
P15056,Q6VAB6,0,3,A,0,0.0,Very Low
P15056,Q6VAB6,0,4,L,0,0.0,Very Low
P15056,Q6VAB6,0,5,S,2,0.00169406074475,Very Low
P15056,Q6VAB6,0,6,G,2,0.00286604835082,Very Low
P15056,Q6VAB6,0,7,G,2,0.00219393392881,Very Low
P15056,Q6VAB6,0,8,G,2,0.000384057971014,Very Low
P15056,Q6VAB6,0,9,G,2,0.0,Very Low

Running the script

The script requires a specific anaconda environment to run: ecalir.txt

Note that running the script is extremely resource intensive. As a rule of thumb, allocate 10GB of RAM per core used.

Code The code below outlines the main script that orchestrates the data parsing subroutines for ECLAIR. As you can see in the code, the process is broken into 8 steps, each of which will run many different helper scripts.

The individual helper scripts can be used independently to generate individual features. ECLAIR Pipeline Steps

STEP 0 Code Setup Sets up logging behavior which can be accessed at /home/jc2375/outputs/get_features.log
STEP 1 List new UniProt Ids Downloads the latest information for the UniProt IDs in the file and appends them to a cached uniprot_info.txt file
STEP 2 ExPasy and Pfam features Extracts information about biophysical features, isoforms, and Pfam domains
STEP 3 Conservation Features Generates multiple sequence alignment files and calculates both Jensen-Shannon Conservation and Coevolution via DCA and SCA
STEP 4 MODBASE Models Downloads existing MODBASE models and calculates surface residues
STEP 5 PDB Structures Downloads existing PDB models and calculates surface residues
STEP 6 DOCKING Runs several trials of molecular docking to generate paired structural features when available
STEP 7 Make Interaction Files Compiles all information parsed above into ECLAIR-compatible pandas dataframes
STEP 8 Predict Runs the ECLAIR classifier and saves results as either pickled dataframes or csv files

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
TPE		TPE
scripts		scripts
ECLAIR.py		ECLAIR.py
LICENSE.txt		LICENSE.txt
README.md		README.md
eclair.txt		eclair.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Input Format

Output Format

Running the script

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

haiyuan-yu-lab/ECLAIR

Folders and files

Latest commit

History

Repository files navigation

Input Format

Output Format

Running the script

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages