Skip to content

AltinLab/tcrtrifold-experiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extracting raw cognate triads from IEDB and VDJDB

The raw extracted data from IEDB and VDJDB is already available in this repo in data/iedb-vdjdb/raw

Alternatively, if you're interested in re-running our data extraction, clone our fork of the IEDB_IMMREP data repo and run the data extraction script.

git clone https://github.com/ljwoods2/IEDB_IMMREP.git
cd IEDB_IMMREP
git checkout new-categories
python setup.py install
chmod +x run.sh
./run.sh

Formatting triads from IEDB and VDJDB

Unique, formatted triads are available by category (species and MHC class) in data/iedb-vdjdb/iedb and data/iedb-vdjdb/vdjdb in the format described in tcr_format_parsers. All categories are also available in parquet format with duplicates allowed (non-unique) to allow for storing DB metadata.

If you're interested in re-running our formatting and non-cognate triad creation code, first create a conda environment containing the necessary dependencies:

conda create -n af3-analyzer --file envs/af3-analyzer.yaml

Then, run each cell sequentially in data/iedb-vdjdb/reformat.ipynb using the af3-analyzer environment as the kernel.

Running AF3 inference for triads from IEDB and VDJDB

Our lab used the nextflow pipelines in the af3-nf repo for running inference on triads. While these pipelines were designed to run on TGen's Gemini supercomputer, they can be easily adapted to run in other environments. Please contact the authors for details.

See data/iedb-vdjdb/iedb/human_I/run_af3_triad.sh for an example slurm script that runs the pipelines.

Identifying IEDB and VDJDB overlap with PDB

Blast+ for pdb alignment

conda create -n blast --file envs/blast.yaml
cd /path/to/pdbaa/dir
update_blastdb.pl --decompress pdbaa
cd data/iedb-vdjdb
blastp -query fasta_queries/all_triads.fasta -db /path/to/pdbaa/dir -out pdb_blast_results/blast_result.csv -outfmt 10

Formatting cognate triads from PDB

Unique, formatted PDB triads are available in data/pdb/pdb_triads.csv. Non-unique, formatted triads are available in parquet format.

Raw PDB summary files in data/pdb/raw come from STCRDab.

If you're interested in re-running our formatting code, first clone the IMGTHLA repo (this is used to identify likely MHC alleles for each sequence):

git clone https://github.com/ANHIG/IMGTHLA

Then, run each cell sequentially in data/pdb/reformat.ipynb using the af3-analyzer environment as the kernel, making sure to modify the variable IMGT_HLA_PATH with your own path to the cloned IMGTHLA repo.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published