Skip to content

sdegeorgia/CANVAS-summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

CANVAS: CNV Analysis Next-gen Variant Assessment Suite

CANVAS (formerly CNVFlow) is an internal, modular pipeline for automated detection and annotation of copy number variants (CNVs) from low-pass whole genome sequencing (WGS) data.

⚠️ This repository is a summary. The full pipeline code is private and not publicly available due to internal usage and licensing restrictions.


Overview

CANVAS automates the CNV analysis process from raw FASTQ to gene- and cell-type–annotated CNV calls.

Input: Paired-end FASTQ files
Output: Annotated CNV calls with gene names, cell types, and germ layer information

Pipeline Stages

  1. FASTQ discovery
  2. BWA-MEM alignment to GRCh38
  3. BAM filtering with samtools
  4. CNV calling using Control-FREEC
  5. Annotation with ChIPseeker and PanglaoDB

My Role

  • Designed and deployed the full pipeline architecture using Bash and R
  • Integrated Control-FREEC and custom filtering steps
  • Automated QC metrics and error-handling across multiple samples
  • Managed parallel execution and LSF job submission
  • Developed downstream R-based gene, cell type, and germ layer annotation modules

Tools & Frameworks

Tool Purpose
BWA-MEM Read alignment
samtools BAM filtering
Control-FREEC CNV detection
R + Bioconductor Annotation (ChIPseeker, PanglaoDB)
Bash scripting Workflow logic
LSF cluster Parallelization and job control

Sample Output Files

  • *_CNVs: CNV calls from FREEC
  • *_CNVs_annotated.csv: Gene + cell type–annotated CNVs
  • *_CNV_plot_filtered.png: Visual CNV plots (linear/log2 scale)

Annotated CSV includes:

  • SYMBOL, cell_types, germ_layers, CNV, annotation, Description

Built-in QC Features

  • Auto-detection of missing FASTQs
  • Alignment and filtering statistics
  • Resume capability for interrupted jobs
  • Comprehensive logging per sample and pipeline stage

Performance

  • Optimized for 0.5–2x WGS coverage
  • Supports matched-control and control-free modes
  • Parallel sample processing with configurable thread counts

🔒 Disclaimer

This summary reflects internal work used for research and client-facing genomics services. The pipeline code is not open source and is not available for distribution.


👩‍🔬 Related Projects

About

CNV Analysis Next-gen Variant Assessment Suite

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published