Skip to content

nicwulab/Ab_allele_polymorphism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analyzing the effect of allelic polymorphisms on antibody binding activity

This README describes the analysis in:
Widespread impact of immunoglobulin V gene allelic polymorphisms on antibody reactivity

Dependencies

Dependencies Installation

Install dependencies by conda:

conda create -n Abs \
  python=3.9 \
  biopython \
  pandas \
  igblast \
  anarci

Input files

Local PyIR setup

PyIR: An IgBLAST wrapper and parser

pip3 install crowelab_pyir

Database set up in pyir library directory

pyir setup

Manually install IMGT REF database

  1. Sequence download from http://www.imgt.org/vquest/refseqh.html#VQUEST

  2. Copy and paste, save as fasta (save all V gene in one file; all D gene in one file; all J gene in one file)

  3. Clean data (raw edit_imgt_file.pl can be found on igblast-1.17.1xxx/bin)

edit_imgt_file.pl imgt_database/human_prot/imgt_raw/IGV.fasta > imgt_database/human_prot/IGV.fasta

  1. Create database (use "-dbtype prot" for protein sequence, use "-dbtype nucl" for DNA sequence). For example:

makeblastdb -parse_seqids -dbtype prot -in imgt_database/human_prot/IGV.fasta

makeblastdb -parse_seqids -dbtype nucl -in imgt_database/human_nuc/IGV.fasta

  1. Run PyIR for igBlast

PDB to Paratope

  1. Run DSSP to find list of amino acid locations that interact with antigens
    python pdb_to_paratope.py {pdb_dir} {summary_file} {mkdssp_dir}
    for example: python pdb_to_paratope.py /Users/natalieso/Downloads/20230217_0084705/ data/20230217_0084705_summary.tsv /Users/natalieso/Downloads/dssp-3.1.4/mkdssp

Allelic Variant at Paratope

  1. Extract PDB sequences from PDB files
    python utils/get_sequence_from_pdb_file.py {pdb_dir} {output_file} {summary_file}
    for example: python utils/get_sequence_from_pdb_file.py /Users/natalieso/Downloads/20230217_0084705/ results/pdb_to_paratope/pdb_sequences.fasta data/20230217_0084705_summary.tsv
  1. Remove X's in fasta file
    python utils/remove_x_in_fasta.py
  1. Run PyIR
    pyir results/pdb_sequences.fasta --sequence_type prot --legacy --germlineV imgt_database/human_prot/IGV.fasta -s human
    This generates a zip file. Please unzip the json file, rename it to {pdb_sequences.json} and move it to the results/pdb_to_paratope directory.

  2. Get list of DSSP positions for all PDB files
    python utils/get_dssp.py {pdb_dir} {summary_file} {mkdssp_dir}
    for example: python utils/get_dssp.py /Users/natalieso/Downloads/20230217_0084705/ data/20230217_0084705_summary.tsv /Users/natalieso/Downloads/dssp-3.1.4/mkdssp

  3. Get allelic variant at paratope
    python allelic_variant_at_paratope.py

Compute DDG

  1. Run FoldX on list of PDB. Run this in the directory where you want to store the FoldX outputs.
    python compute_ddG_foldx_script.py {pdb_dir}
    for example: python compute_ddG_foldx_script.py /Users/natalieso/Downloads/20230217_0084705/

    • Output files:
      • List of FoldX outputs containing DDG values
  2. Parse FoldX output and save DDG to CSV file.
    python compute_ddG.py {foldx output directory}
    for example: python compute_ddG.py /Users/natalieso/Downloads/foldx_remote/

    • Output file:
      • empty_ddg_rows.csv
      • FoldX.csv
        FoldX cannot handle any positions with non-integer numbering. FoldX.csv only includes DDG values when the mutation target location is an integer. empty_ddg_rows.csv is the list where DDG was not calculated. The PDB files have been modified to integer-only locations, saved to /data/modified_pdb_files, and then run through FoldX separately. The resulting DDG values were inputted to results/compute_ddG/compute_ddG_result.csv
    • Output file:

Epitope Identification

  1. Run epitope_identification.py to get list of epitope along with a group ID for each entry in results/compute_ddG/compute_ddG_result.csv.
    python epitope_identification.py {pdb_dir} {mkdssp_dir}
    for example: python epitope_identification.py /Users/natalieso/Downloads/20230217_0084705/ /Users/natalieso/Downloads/dssp-3.1.4/mkdssp

Baseline Variation

  1. Run baseline_variation.py to get baseline variation with data/anarci_igv_output.csv_H.csv and data/anarci_igv_output.csv_KL.csv for each entry in results/epitope_idenfication/epitope_identification_with_group_id.csv.
    python baseline_variation.py

Compute Antibody DDG

  1. Run utils/remove_chains.py to remove chains other than antibody chains for all PDBs
    python utils/remove_chains.py {pdb_dir} {summary_file} {output_dir}

  2. Run FoldX on list of PDB with only antibody chains. Run this in the directory where you want to store the FoldX outputs.
    python compute_ddG_foldx_script.py {pdb_dir}
    for example: python compute_ddG_foldx_script.py /Users/natalieso/Downloads/pdb_antibodies_only/

    • Output files:
      • List of FoldX outputs containing DDG values
  3. Parse FoldX output and save DDG to CSV file.
    python compute_ddG_antidody.py {foldx output directory}
    for example: python compute_ddG_antidody.py /Users/natalieso/Downloads/dir_antibodies_only_foldx_output/

    • Output file:
      • empty_ddg_rows.csv
      • FoldX.csv
        FoldX cannot handle any positions with non-integer numbering. FoldX.csv only includes DDG values when the mutation target location is an integer. empty_ddg_rows.csv is the list where DDG was not calculated. The PDB files have been modified to integer-only locations, saved to /data/modified_pdb_files_antibody_only, and then run through FoldX separately. The resulting DDG values were inputted to results/epitope_identification_with_antibody_only_ddgs.csv
    • Output file:

Add resolution, RSA, and antigen species information

  1. Run add_antibody_dssp.py to get RSA from DSSP of the antibody apo form for each entry in results/epitope_identification_with_antibody_only_ddgs.csv.
    python add_antibody_dssp.py {pdb_dir} {mkdssp_dir}
    for example: python add_antibody_dssp.py /Users/natalieso/Downloads/20230217_0084705/ /Users/natalieso/Downloads/dssp-3.1.4/mkdssp

  2. Run add_antigen_ID.py to add resolution and antigen species information.
    python add_antigen_ID.py

  3. Run count_PDB.py to count PDB IDs.
    python count_PDB.py

  4. Generate summary statistics of the dataset.
    summary_stats.py

Plotting

  1. Plot the antigen species and distribution of resolution of the analyzed structures.
    Rscript plot_summary.R

  2. Plot the distribution of DDG.
    Rscript plot_DDG_dist.R

  3. Plot the allele usage for IGV genes of interest among antibodies in GenBank.
    Rscript plot_allele_usage.R

  4. Plot DDG distribution for allelic variants of interest.
    python plot_antigen_germline.R

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published