VariantSeq-Pipeline is a comprehensive bioinformatics pipeline for processing DNA sequencing data and identifying genomic variants. This pipeline automates the steps from downloading raw sequencing data to generating a VCF (Variant Call Format) file, providing a streamlined and efficient workflow for variant calling.
- Conda (Anaconda or Miniconda)
- Internet connection for downloading data and tools
- Sets up the conda environment and installs required tools.
- Downloads raw sequencing reads (FASTQ files) and the reference genome.
- Fetches the reference genome using Biopython.
- Trims the sequencing reads to remove low-quality bases.
- Aligns the trimmed reads to the reference genome.
- Converts the alignment from SAM to BAM, sorts the BAM file, and generates mapping statistics.
- Calls variants and outputs them in VCF format.
- output.sorted.bam: Sorted BAM file of the aligned reads.
- variants.raw.vcf: VCF file containing the called variants.
- mappingstats.txt: Alignment statistics.
- dna_seq_pipeline.sh: The main script for executing the pipeline.
- README.md: This documentation file.