ContextSV

A long-read, whole-genome structural variant (SV) caller. It takes as input long read alignments (BAM), the corresponding reference genome (FASTA), a VCF with high-quality SNPs (e.g. via GATK, Deepvariant, NanoCaller, and gnomAD database VCF files with SNP population frequencies for each chromosome. Class documentation is available at https://wglab.openbioinformatics.org/ContextSV

Installation

Anaconda

First, install Anaconda.

Next, create a new environment. This installation has been tested with Python 3.9, Linux 64-bit.

conda create -n contextsv python=3.9
conda activate contextsv

ContextSV and its dependencies can then be installed using the following command:

conda install -c wglab -c conda-forge -c bioconda contextsv

Docker

First, install Docker. Pull the latest image from Docker hub, which contains the latest release and its dependencies.

docker pull genomicslab/contextsv

Building from source (for testing/development)

ContextSV requires HTSLib as a dependency that can be installed using Anaconda. Create an environment containing HTSLib:

conda create -n htsenv -c bioconda -c conda-forge htslib
conda activate htsenv

Then follow the instructions below to build ContextSV:

git clone https://github.com/WGLab/ContextSV
cd ContextSV
make

ContextSV can then be run:

./build/contextsv --help

Usage: ./build/contextsv [options]
Options:
  -b, --bam <bam_file>          Long-read BAM file (required)
  -r, --ref <ref_file>          Reference genome FASTA file (required)
  -s, --snp <vcf_file>          SNPs VCF file (required)
  -o, --outdir <output_dir>     Output directory (required)
  -c, --chr <chromosome>        Chromosome
  -t, --threads <thread_count>  Number of threads
  -h, --hmm <hmm_file>          HMM file
  -n, --sample-size <size>      Sample size for HMM predictions
     --min-cnv <min_length>     Minimum CNV length
     --eps <epsilon>             DBSCAN epsilon
     --min-pts-pct <min_pts_pct> Percentage of mean chr. coverage to use for DBSCAN minimum points
  -e, --eth <eth_file>          ETH file
  -p, --pfb <pfb_file>          PFB file
     --save-cnv                 Save CNV data
     --debug                    Debug mode with verbose logging
     --version                  Print version and exit
  -h, --help                    Print usage and exit

Downloading gnomAD SNP population frequencies

SNP population allele frequency information is used for copy number predictions in this tool (see PennCNV for specifics). We recommend downloading this data from the Genome Aggregation Database (gnomAD).

Download links for genome VCF files are located here (last updated April 3, 2024):

gnomAD v4.0.0 (GRCh38): https://gnomad.broadinstitute.org/downloads#4
gnomAD v2.1.1 (GRCh37): https://gnomad.broadinstitute.org/downloads#2

Script for downloading gnomAD VCFs

download_dir="~/data/gnomad/v4.0.0/"

chr_list=("1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "X" "Y")

for chr in "${chr_list[@]}"; do
    echo "Downloading chromosome ${chr}..."
    wget "https://storage.googleapis.com/gcp-public-data--gnomad/release/4.0/vcf/genomes/gnomad.genomes.v4.0.sites.chr${chr}.vcf.bgz" -P "${download_dir}"
done

Finally, create a text file that specifies the chromosome and its corresponding gnomAD filepath. This file will be passed in as an argument:

gnomadv4_filepaths.txt

1=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr1.vcf.bgz
2=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr2.vcf.bgz
3=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chr3.vcf.bgz
...
X=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chrX.vcf.bgz
Y=~/data/gnomad/v4.0.0/gnomad.genomes.v4.0.sites.chrY.vcf.bgz

Revision history

For release history, please visit here.

Getting help

Please refer to the contextSV issue pages for posting your issues. We will also respond your questions quickly. Your comments are critical to improve our tool and will benefit other users.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
conda		conda
data		data
include		include
lib		lib
python		python
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
Doxyfile		Doxyfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
scores.png		scores.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ContextSV

Installation

Anaconda

Docker

Building from source (for testing/development)

Downloading gnomAD SNP population frequencies

Script for downloading gnomAD VCFs

Revision history

Getting help

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

WGLab/ContextSV

Folders and files

Latest commit

History

Repository files navigation

ContextSV

Installation

Anaconda

Docker

Building from source (for testing/development)

Downloading gnomAD SNP population frequencies

Script for downloading gnomAD VCFs

Revision history

Getting help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages