Overview

VariantSeq-Pipeline is a comprehensive bioinformatics pipeline for processing DNA sequencing data and identifying genomic variants. This pipeline automates the steps from downloading raw sequencing data to generating a VCF (Variant Call Format) file, providing a streamlined and efficient workflow for variant calling.

Requirements

Conda (Anaconda or Miniconda)
Internet connection for downloading data and tools

Pipeline Steps:

Sets up the conda environment and installs required tools.
Downloads raw sequencing reads (FASTQ files) and the reference genome.
Fetches the reference genome using Biopython.
Trims the sequencing reads to remove low-quality bases.
Aligns the trimmed reads to the reference genome.
Converts the alignment from SAM to BAM, sorts the BAM file, and generates mapping statistics.
Calls variants and outputs them in VCF format.

Output Files:

output.sorted.bam: Sorted BAM file of the aligned reads.
variants.raw.vcf: VCF file containing the called variants.
mappingstats.txt: Alignment statistics.

File Structure

dna_seq_pipeline.sh: The main script for executing the pipeline.
README.md: This documentation file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
variant calling		variant calling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Requirements

Pipeline Steps:

Output Files:

File Structure

About

Uh oh!

Releases

Packages

Languages

Achiraa/VariantSeq-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirements

Pipeline Steps:

Output Files:

File Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages