Skip to content

nicwulab/Mos99_NA_DMS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Deep mutation scanning of human influenza H3N2 NA

This README describes the analysis of the deep mutational scanning experiment for H3N2 A/Moscow/10/1999 (Mos99) neuraminidase (NA).

Dependencies

Input files

Dependencies installation

  1. Install dependencies by conda:
conda create -n NA -c bioconda -c anaconda -c conda-forge \
  python=3.9 \
  seqtk \
  flash \
  biopython \
  cutadapt \
  snakemake \
  prody
  1. Activate conda environment:
    conda activate NA

Calculating mutational fitness from sequencing data

  1. Using UMI to correct sequencing errors:
    python3 script/Dedup_UMI.py fastq NNNNNNN 0.8 2

  2. Counting mutations:
    snakemake -s script/Mos99_pipeline.smk -j 10

  3. Convert counts to fitness:
    python3 script/count2fitness.py

Data analysis

  1. Compute mutational tolerance for each residue
    python3 script/Mean_mut_fit_per_resi.py

  2. Assign residue type and calculate RSA
    python3 script/pos_type_analysis.py

  3. Calculate distance to active site for each residue
    python3 script/Dist_analysis.py

  4. Calculate natural mutation frequency
    python3 script/natural_mut_analysis.py

Plotting

  1. Plots for checking data quality
    Rscript script/plot_QC.R

  2. Comparing the data in this study with our previous study (Wang et al. 2021)
    Rscript script/plot_cross_valid.R

  3. Heatmap of mutational fitness
    Rscript script/plot_fitness_heatmap.R

  4. Compare RSA and fit across residue types
    Rscript script/plot_pos_type_analysis.R

  5. Plot correlation between fitness and distance to active site
    Rscript script/plot_dist_to_active_site.R

  6. Plot correlation between fitness and natural mutation frequency
    Rscript script/plot_natural_mut_fit.R

  7. Plot DMS fitness vs predicted stability effect using FoldX and predicted fitness using MSA Transformer
    Rscript script/plot_fit_vs_predict.R

  8. Compare distribution of fitness effects between naturally observed vs unobserved mutations
    Rscript script/plot_fit_conserved.R

  9. Plot sequence logos for residues in cluster 2
    python3 script/cluster2_seqlogo.py