Analysis notebooks and scripts for "Detection of PCR chimeras in adaptive immune receptor repertoire sequences"

Description

Notebooks and scripts used to produces the figures and tables in the CHMMAIRRa paper. The figures generated from simulated data require only the databases already included in the repository. The figures generated from real data require preprocessing the datasets with IgDiscover according to the instructions in the IgDiscover_preprocessing folder.

Dependencies

Julia 1.10.5 for running the notebooks.

I recommend using juliaup to install Julia
All Julia package dependencies are listed in the Manifest.toml file.
Re-create the environment with the following commands in julia:
```
using Pkg; Pkg.activate("."); Pkg.instantiate()
```

USEARCH v11.0.667_i86linux32 comparison method.

VSEARCH v2.29.1_linux_x86_64 comparison method.

IgDiscover v1.0.4 for preprocessing the real datasets.

MAFFT v7.490 for reference database alignment.

Muscle v5.3 for reference database alignment.

ART 2016-06-05 to simulate MiSeq noise in TRB sequences.

Shazam v1.2.0 to simulate SHM in IGH sequences.

Simulated data evaluation (Fig. 3, Fig. S1)

These scripts requires the Julia environment, ART, Shazam, VSEARCH, and USEARCH.

simulate_IGH_shazam.jl : Simulates IGH V, D, and J datasets with SHM added by shazam's shmulateSeq.
simulate_TRB_art_illumina.jl : Simulates TRB V, D, and J datasets with sequencing noise added by art_illumina from ART.
ROCs.ipynb : Generates ROCs for simulated TRB and IGH V, D, and J datasets.
run_benckmarks.jl : Runs CHMMAIRRa and uchime on varying sizes of simulated IGH and TRB datasets.
benchmark_speed.ipynb : Plots the speed of CHMMAIRRa, USEARCH, and VSEARCH on simulated (and real) datasets.

Main real data analysis

The analysis of the real data involves running IgDiscover on 319 libraries with dataset-specific settings, so this takes some doing. Requires IgDiscover and the Julia environment.

IgDiscover_preprocessing : This folder contains instructions for preprocessing the real datasets with IgDiscover. One .md file for each of 5 datasets. Also contains descriptions of where to find the raw fastq data.
run_CHMMAIRRa.jl : Runs CHMMAIRRa on all 5 real datasets in the paper (4 published and 1 new).
run_CHMMAIRRa_db_subsampling.jl : Runs CHMMAIRRa on specific real TCR and IGH libraries with subsampled databases (for Fig. 4).
recombinations.ipynb : Plots recombination information from real datasets. Produces the heatmaps (Fig. 7), recombination percentage scatterplots (Fig 6), and database subsampling scatterplots (Fig. 4) as well as Supplementary Figures S2, S3, and S7.
PCR_conditions.ipynb : Plots this paper's PCR parameter modification dataset (Fig. 5).
lineages.ipynb : Plots lineage information from a real dataset (Fig. 1).
summarize_seqcounts.ipynb : Gathers sequence count data from all datasets (Supplementary Data 1).

Other analyses

run_CHMMAIRRa_varying_V_DFR.jl : Runs CHMMAIRRa at varied minimum differences from reference (DFR) settings (for Supplementary Figure S2).
compare_alignment_settings.jl : Runs a set of libraries with varying V database alignment methods (for Supplementary Figure S3).
databases.ipynb : Plots pairwise edit distances between database V alleles (Supplementary Figure S6).
run_CHMMAIRRa_Js.jl : Runs CHMMAIRRa on J segments for a few libraries to get a grasp on their frequency.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
IgDiscover_preprocessing		IgDiscover_preprocessing
data		data
notebooks		notebooks
scripts		scripts
src		src
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Analysis notebooks and scripts for "Detection of PCR chimeras in adaptive immune receptor repertoire sequences"

Description

Dependencies

Simulated data evaluation (Fig. 3, Fig. S1)

Main real data analysis

Other analyses

About

Uh oh!

Releases

Packages

Languages

MurrellGroup/CHMMAIRRaAnalyses

Folders and files

Latest commit

History

Repository files navigation

Analysis notebooks and scripts for "Detection of PCR chimeras in adaptive immune receptor repertoire sequences"

Description

Dependencies

Simulated data evaluation (Fig. 3, Fig. S1)

Main real data analysis

Other analyses

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages