CANVAS (formerly CNVFlow) is an internal, modular pipeline for automated detection and annotation of copy number variants (CNVs) from low-pass whole genome sequencing (WGS) data.
⚠️ This repository is a summary. The full pipeline code is private and not publicly available due to internal usage and licensing restrictions.
CANVAS automates the CNV analysis process from raw FASTQ to gene- and cell-type–annotated CNV calls.
Input: Paired-end FASTQ files
Output: Annotated CNV calls with gene names, cell types, and germ layer information
- FASTQ discovery
- BWA-MEM alignment to GRCh38
- BAM filtering with samtools
- CNV calling using Control-FREEC
- Annotation with ChIPseeker and PanglaoDB
- Designed and deployed the full pipeline architecture using Bash and R
- Integrated Control-FREEC and custom filtering steps
- Automated QC metrics and error-handling across multiple samples
- Managed parallel execution and LSF job submission
- Developed downstream R-based gene, cell type, and germ layer annotation modules
Tool | Purpose |
---|---|
BWA-MEM | Read alignment |
samtools | BAM filtering |
Control-FREEC | CNV detection |
R + Bioconductor | Annotation (ChIPseeker, PanglaoDB) |
Bash scripting | Workflow logic |
LSF cluster | Parallelization and job control |
*_CNVs
: CNV calls from FREEC*_CNVs_annotated.csv
: Gene + cell type–annotated CNVs*_CNV_plot_filtered.png
: Visual CNV plots (linear/log2 scale)
Annotated CSV includes:
SYMBOL
,cell_types
,germ_layers
,CNV
,annotation
,Description
- Auto-detection of missing FASTQs
- Alignment and filtering statistics
- Resume capability for interrupted jobs
- Comprehensive logging per sample and pipeline stage
- Optimized for 0.5–2x WGS coverage
- Supports matched-control and control-free modes
- Parallel sample processing with configurable thread counts
This summary reflects internal work used for research and client-facing genomics services. The pipeline code is not open source and is not available for distribution.
- LAAVA-summary – rAAV integration detection from ONT data