GitHub - AltinLab/tcrtrifold-experiments

Extracting raw cognate triads from IEDB and VDJDB

The raw extracted data from IEDB and VDJDB is already available in this repo in data/iedb-vdjdb/raw

Alternatively, if you're interested in re-running our data extraction, clone our fork of the IEDB_IMMREP data repo and run the data extraction script.

git clone https://github.com/ljwoods2/IEDB_IMMREP.git
cd IEDB_IMMREP
git checkout new-categories
python setup.py install
chmod +x run.sh
./run.sh

Formatting triads from IEDB and VDJDB

Unique, formatted triads are available by category (species and MHC class) in data/iedb-vdjdb/iedb and data/iedb-vdjdb/vdjdb in the format described in tcr_format_parsers. All categories are also available in parquet format with duplicates allowed (non-unique) to allow for storing DB metadata.

If you're interested in re-running our formatting and non-cognate triad creation code, first create a conda environment containing the necessary dependencies:

conda create -n af3-analyzer --file envs/af3-analyzer.yaml

Then, run each cell sequentially in data/iedb-vdjdb/reformat.ipynb using the af3-analyzer environment as the kernel.

Running AF3 inference for triads from IEDB and VDJDB

Our lab used the nextflow pipelines in the af3-nf repo for running inference on triads. While these pipelines were designed to run on TGen's Gemini supercomputer, they can be easily adapted to run in other environments. Please contact the authors for details.

See data/iedb-vdjdb/iedb/human_I/run_af3_triad.sh for an example slurm script that runs the pipelines.

Identifying IEDB and VDJDB overlap with PDB

Blast+ for pdb alignment

conda create -n blast --file envs/blast.yaml

cd /path/to/pdbaa/dir
update_blastdb.pl --decompress pdbaa

cd data/iedb-vdjdb
blastp -query fasta_queries/all_triads.fasta -db /path/to/pdbaa/dir -out pdb_blast_results/blast_result.csv -outfmt 10

Formatting cognate triads from PDB

Unique, formatted PDB triads are available in data/pdb/pdb_triads.csv. Non-unique, formatted triads are available in parquet format.

Raw PDB summary files in data/pdb/raw come from STCRDab.

If you're interested in re-running our formatting code, first clone the IMGTHLA repo (this is used to identify likely MHC alleles for each sequence):

git clone https://github.com/ANHIG/IMGTHLA

Then, run each cell sequentially in data/pdb/reformat.ipynb using the af3-analyzer environment as the kernel, making sure to modify the variable IMGT_HLA_PATH with your own path to the cloned IMGTHLA repo.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.dvc		.dvc
.github		.github
data		data
envs		envs
notebooks		notebooks
scripts		scripts
src/tcrtrifold		src/tcrtrifold
workflows		workflows
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Extracting raw cognate triads from IEDB and VDJDB

Formatting triads from IEDB and VDJDB

Running AF3 inference for triads from IEDB and VDJDB

Identifying IEDB and VDJDB overlap with PDB

Formatting cognate triads from PDB

About

Uh oh!

Releases

Packages

Languages

AltinLab/tcrtrifold-experiments

Folders and files

Latest commit

History

Repository files navigation

Extracting raw cognate triads from IEDB and VDJDB

Formatting triads from IEDB and VDJDB

Running AF3 inference for triads from IEDB and VDJDB

Identifying IEDB and VDJDB overlap with PDB

Formatting cognate triads from PDB

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages