- ./Fasta/SARS2-NTD.fa: NTD sequences with 21 nt upstream (5' flank)
- ./Fasta/Amplicon.fa: Amplicon sequences for NGS (including recovery primer regions)
- ./Fasta/NTD_ref.fa: Reference (i.e. wild type) amino acid sequences (primer regions not included)
- ./PDB/6zge.pdb: Spike cryoEM structure from Wrobel et al. (2020)
- ./PDB/6zge_RBD.pdb: RBD only from the Spike cryoEM structure from Wrobel et al. (2020)
- ./PDB/7b62.pdb: NTD crystal structure from Rosa et al. (2021)
- ./PDB/spike_with_complete_NTD.pdb: 6zge with NTD in chain A being replaced by 7b62 (chain X)
- ./data/ASA.table: Maximum accessible surface area for individual amino acids from Tien et al. (2013)
- ./data/RBD_DMS_data.csv: DMS data of RBD from Chan et al. (2021)
- Raw read files in fastq format from NIH SRA database BioProject PRJNA792013
-
Generating foward (NNK + internal barcode) and reverse primers (constant)
python3 script/lib_primer_design.py
- Input file:
- Output files:
-
Generating barcode file
python3 script/check_barcode.py
- Input files:
- Output file:
-
Merge overlapping paired-end reads using PEAR
pear -f [FASTQ FILE FOR FORWARD READ] -r [FASTQ FILE FOR FORWARD READ] -o [OUTPUT FASTQ FILE]
- Output files should be placed in the fastq_merged/ folder and named as described in ./doc/filename_merged_fastq.txt
-
Counting variants based on nucleotide sequences
python3 script/NTD_fastq2count.py
- Input files:
- Merged read files in fastq_merged/ folder
- Output files:
- result/NTD_DMS_count_nuc_A.tsv
- result/NTD_DMS_count_nuc_B.tsv
- Input files:
-
Convert nucleotide sequences to amino acid mutations
python3 script/NTD_count_nuc2aa.py
- Input files:
- ./data/barcodes.tsv
- ./Fasta/NTD_ref.fa
- result/NTD_DMS_count_nuc_A.tsv
- result/NTD_DMS_count_nuc_B.tsv
- Output files:
- result/NTD_DMS_count_aa_A.tsv
- result/NTD_DMS_count_aa_B.tsv
- Input files:
-
Compute expression score
python3 script/NTD_count2score.py
- Input files:
- result/NTD_DMS_count_aa_A.tsv
- result/NTD_DMS_count_aa_B.tsv
- Output file:
- Input files:
-
Plot correlation between replicates as quality control
Rscript script/plot_QC.R
- Input file:
- Output files:
- ./graph/QC_replicate_exp.png
- ./graph/Exp_by_class.png
- ./result/NTD_DMS_expression_score.tsv
Rscript script/plot_QC_mean_exp_by_replicates.R
- Input file:
- Output files:
- ./graph/QC_replicate_mean_exp.png
Rscript script/QC_bin0_vs_bin3_counts.R
- ./graph/QC_replicate_mean_exp.png
- Input file:
- Output files:
- ./graph/bin0_bin_3_ratio_rep_1_all.png
- ./graph/bin0_bin_3_ratio_rep_2_all.png
Rscript script/QC_bin0_vs_bin3_biomodality.R
- Input file:
- Output files:
-
Plot heatmap for input frequencies of individual mutations
Rscript script/plot_input_freq_heatmap.R
- Input file:
- Ouput file:
-
Plot heatmap for the expression scores of individual mutations
Rscript script/plot_score_heatmap.R
- Input file:
- Ouput file:
-
Plot mutational tolerability in loops vs others
Rscript script/NTD_loop_vs_other_residues.R
- Input file:
- Output file:
-
Plot the mutational tolerability in selected hotspot regions
Rscript script/Hot_spots_vs_other_residues.R
- Input file:
- Output file:
-
Computing relative solvent accessibility (RSA) for individual residues
python3 script/RSA_analysis.py
- Input files:
- Ouput file:
-
Compute expression score for RBD DMS data
Rscript script/compute_exp_score_RBD.R
- Input file:
- Output file:
-
Compute RSA for RBD residues
python3 script/RBD_analysis.py
- Input files:
- Output file:
-
Computing the distance of individual NTD residues to RBD or S2
python3 script/Dist_analysis.py
- Input file:
- Output file:
-
Replace the B-factor by expression score in the PDB file
python3 script/Bfactor_to_score.py
- Input file:
- Output file:
-
Plot mean expression score vs RSA for individual residues
Rscript script/plot_RSA.R
- Input file:
- Output file:
-
Plot mutational tolerability vs RSA for RBD DMS data
Rscript script/Mean_expression_score_in_mammalian_system.R
- Input file:
- Output file:
- ./result/RBD_exp_RSA.tsv
Rscript script/plot_RSA_RBD.R
- ./result/RBD_exp_RSA.tsv
- Input file:
- Output file:
-
Plot mutational tolerability vs distance to RBD/S2 for individual residues and categorized by antibody epitopes
Rscript script/Dist_to_RBD_S2_exp_by_Ab.R
- Input file:
- Output file:
- ./graph/Exp_vs_dist.png
- ./graph/antibody_epi_vs_mean_exp.png
Rscript script/antibody_epitopes_vs_distances.R
- Input file:
- Output file:
-
Plot mutational tolerability vs sequence conservation for individual residues
Rscript script/align_freq_vs_score.R
- Input files:
- Output file:
-
Plot mutational tolerability vs sequence conservation for individual residues
Rscript script/RSA_vs_alignment_frequency.R
- Input files:
- Output files:
-
Visualizing the mutational tolerability on the S protein structure
Pymol script/plot_Bfactor_as_exp.pml
- Input files:
- Output files:
-
Analysis of the ciculating NTD mutations/indels among 17 major variants
Rscript script/NTD_circulating_mutation_vs_other_residues.R
-
Mutational tolerability of selected regions compared to other residues
Rscript script/NTD_loop_vs_other_residues.R
- Input files:
- Output files:
- ./graph/NTD_loop_vs_other_residues.png
Rscript script/Hot_spots_vs_other_residues.R
- ./graph/NTD_loop_vs_other_residues.png
- Input files:
- ./result/NTD_DMS_scores_by_resi.tsv
- Output files:
- ./graph/hot_spots_vs_other_residues.png