Skip to content

Usage and parameters

Thomas Cokelaer edited this page Aug 14, 2020 · 6 revisions

Usage

Overview

This pipeline demultiplex Illumina data given a sample sheet file that describe the experiments and a raw data directory given by a sequencer (bcl directory, for base-call library).

A Sample Sheet file contains your sequencing experimental design. It is required by Illumina sequencers so, you should have it and already be aware of what it contains. If not, you should ask your colleagues from the wetlab to provide this file. You should avoid written it yourself and instead use the Illumina tool IEM that will double check the index and their names.

Demultiplexing your NGS data is easy as long as your sample sheet is correct. All you need is the location of your raw data, your sample sheet and then it is one-line command. This pipeline has a very simple sample cheet validator integrated but checks for only a few basic errors.

The demultiplexing may fail for various reasons and the final message may be useless or difficult to understand though. By experience, we know that 99% of the errors come from an erroneous sample sheet.

There are lots of resources online. We provide this page: https://biomics.pasteur.fr/drylab/samplesheet.html that gives an example. Just note that although ending in .csv this is not a CSV file per se. This is a common source of errors when people load the sample sheet in office and end up with commas everywhere.

Basic usage

Once sequana_demultiplex is installed (as explained on the main page), you first need to initialise the pipeline as follows:

sequana_demultiplex --sample-sheet SampleSheet.csv --bcl-directory raw_data_path --merging-strategy merge

then execute the pipeline:

cd demultiplex
sh demultiplex.sh  # on your laptop !! beware that you need lots of memory

the last command will execute a snakemake pipeline locally. If you are on a SLURM cluster, the script demultiplex.sh should already incorporate the slurm options and you just need to type:

cd demultiplex 
srun -c 1 sh demultiplex.sh  
# or sbatch -c 1 --wrap "sh demultiplex.sh"

In both cases, once done, go to the output directory (fastq/) and open the summary.html file. If everything is fine, you can clean up the directory as follows:

make clean

Here below we provide a link towards an example. From there you should get a quick overview of the quality of the demultiplexing and whether an index is missing in the samplesheet or a lane has poor yield.

HTML example

The option --merging-strategy is important. NextSeq sequencer users should merge the lanes. HiSeq users should not (in general). This strategy can be either 'merge' or 'none'. If you wish to merge several lanes (e.g 1 with 2 and 3 with 4), please contact us via the issue page.

Clone this wiki locally