BAM/CRAM QC

The European Genome-phenome Archive (EGA) currently stores nearly 3 million BAM and CRAM files — and this number continues to grow thanks to the contributions of the scientific community.

To improve the quality reports we generate for each of these files, we have developed a set of pipelines that automate the use of multiple bioinformatics tools for comprehensive quality assessment.

If you'd like to use these pipelines, please follow the Start Guide. For further details on how the scripts work, refer to the Documentation.

Example: BAM file from the 1000 Genomes Project

To illustrate the pipeline, we ran it on a small BAM file (Note: The BAM file contains alignments exclusively from chromosome 11) from the 1000 Genomes Project.
You can download the input file here, and you can review the resulting output folder.
It matches the structure and content you should expect if you follow the steps in the guide.

To learn more about the 1000 Genomes Project, visit their official website.

Running the QC pipeline with Docker

To simplify installation and avoid dependency issues, we provide a Docker-based setup that runs the entire pipeline end-to-end.

1. Build the Docker image

From the root of the repository:

docker build -t bam-qc .

Important: Make sure the following files have execution permissions before building the image:

run/qualimap_v2.3/qualimap

run/BAM_pipeline_2.py

output/BAM_finalize_2.py

run/wrapper.py

If not, set them manually using:
chmod +x run/qualimap_v2.3/qualimap run/BAM_pipeline_2.py output/BAM_finalize_2.py run/wrapper.py

2. Run the pipeline

If your BAM, BED and FASTA files are located in /absolute/path/to/, run the container like this:

docker run --rm \
  -v /absolute/path/to:/data \
  -v $(pwd)/output:/app/output \
  bam-qc \
  --bam /data/muestra1.bam \
  --bed /data/regions.bed \
  --fasta /data/reference.fasta

This command:

Mounts your local folder containing the input files as /data inside the container
Mounts the repository's output/ directory (already created as BAM_QC/output/) to store the results
Passes the required arguments (--bam, --bed, --fasta) to the pipeline
Executes the full QC workflow and generates a multiqc_report.html in the output/ folder

3. Run multiQC

To create the report please run inside the output/ folder:

multiqc . -e picard -e qualimap -c multiqc_config.yaml

You're done! Check the results in the multiqc_report.html file.

Benchmarking on other file types

We know the test file is relatively small, so we also evaluated the pipeline on:

A 176 GB WGS BAM file
A 5.8 GB RNA-seq BAM file

You can check the runtime performance and resource usage in the test/performance_logs folder.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
MultiQC-EGA		MultiQC-EGA
Required_Files		Required_Files
docs		docs
output		output
run		run
test		test
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BAM/CRAM QC

Example: BAM file from the 1000 Genomes Project

Running the QC pipeline with Docker

1. Build the Docker image

2. Run the pipeline

3. Run multiQC

Benchmarking on other file types

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

EGA-archive/BAM_QC

Folders and files

Latest commit

History

Repository files navigation

BAM/CRAM QC

Example: BAM file from the 1000 Genomes Project

Running the QC pipeline with Docker

1. Build the Docker image

2. Run the pipeline

3. Run multiQC

Benchmarking on other file types

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages