@@ -11,6 +11,7 @@ The following programs need to be installed and in the PATH environment variable
11
11
- [ * fastqc* ] ( http://www.bioinformatics.babraham.ac.uk/projects/fastqc/INSTALL.txt )
12
12
- [ * cutadapt* ] ( http://cutadapt.readthedocs.io/en/stable/installation.html ) , which requires Python version > 2.7
13
13
- [ * trim_galore* ] ( https://github.com/FelixKrueger/TrimGalore )
14
+ - [ * RESeQC* ] ( http://rseqc.sourceforge.net/ )
14
15
- [ * multiQC* ] ( http://multiqc.info/docs/ )
15
16
- [ * STAR* ] ( https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf )
16
17
- [ * htseq* ] ( http://www-huber.embl.de/HTSeq/doc/install.html#install ) ; the python script htseq-count must also be in the PATH
@@ -21,6 +22,17 @@ In addition, STAR requires genome indices that can be generated from a genome fa
21
22
STAR --runThreadN n --runMode genomeGenerate --genomeDir ref --genomeFastaFiles ref.fa --sjdbGTFfile ref.gtf --sjdbOverhang 99
22
23
```
23
24
25
+ ### Prerequisites for alignment with hisat2
26
+ In order to perform the optional alignment with hisat2, hisat2 must be installed:
27
+ - [ * hisat2* ] ( https://ccb.jhu.edu/software/hisat2/index.shtml )
28
+
29
+ In addition, indexes files * .ht2* must be downloaded from generated from [ * hisat2* ] ( https://ccb.jhu.edu/software/hisat2/index.shtml ) , or generated from a reference fasta file (e.g., reference.fa) and a GTF annotation file (e.g., reference.gtf) using the following commands:
30
+ ``` bash
31
+ extract_splice_sites.py reference.gtf > genome.ss
32
+ extract_exons.py reference.gtf > genome.exon
33
+ hisat2-build reference.fa --ss genome.ss --exon genome.exon genome_tran
34
+ ```
35
+
24
36
### Prerequisites for reads trimming at splice junctions
25
37
In order to perform the optional reads trimming at splice junctions, GATK must be installed:
26
38
- GATK [ * GenomeAnalysisTK.jar* ] ( https://software.broadinstitute.org/gatk/guide/quickstart )
@@ -40,6 +52,11 @@ java -jar picard.jar CreateSequenceDictionary R= ref.fa O= ref.dict
40
52
To run the pipeline on a series of paired-end fastq files (with suffixes * _ 1* and * _ 2* ) in folder * fastq* , and a reference genome with indexes in folder * ref_genome* , one can type:
41
53
``` bash
42
54
nextflow run iarcbioinfo/RNAseq-nf --input_folder fastq --gendir ref_genome --suffix1 _1 --suffix2 _2
55
+ ```
56
+ ### Use hisat2 for mapping
57
+ To use the reads trimming at splice junctions step, you must add the *** --hisat2* option** , specify the path to the folder containing the hisat2 index files, as well as satisfy the requirements above mentionned. For example:
58
+ ``` bash
59
+ nextflow run iarcbioinfo/RNAseq-nf --input_folder fastq --suffix1 _1 --suffix2 _2 --hisat2 --hisat2_idx /home/user/reference/genome_tran
43
60
```
44
61
### Enable reads trimming at splice junctions
45
62
To use the reads trimming at splice junctions step, you must add the *** --sjtrim* option** , specify the path to the folder containing the GenomeAnalysisTK jar file, as well as satisfy the requirements above mentionned. For example:
@@ -60,18 +77,24 @@ nextflow run iarcbioinfo/RNAseq-nf --input_folder fastq --gendir ref_genome --su
60
77
* --input_folder* | . | input folder |
61
78
* --output_folder* | . | output folder |
62
79
* --gendir* | ref | reference genome folder |
63
- * --cpu* | 8 | number of CPUs |
64
- * --mem* | 32 | memory|
80
+ * --cpu* | 4 | number of CPUs |
81
+ * --mem* | 50 | memory for mapping|
82
+ * --memOther* | 2 | memory for QC and counting|
65
83
* --fastq_ext* | fq.gz | extension of fastq files|
66
84
* --suffix1* | \_ 1 | suffix for second element of read files pair|
67
85
* --suffix2* | \_ 2 | suffix for second element of read files pair|
68
86
* --output_folder* | . | output folder for aligned BAMs|
69
- * --fasta_ref* | ref.fa | reference genome fasta file for GATK |
70
87
* --annot_gtf* | Homo_sapiens.GRCh38.79.gtf | annotation GTF file |
71
88
* --annot_gff* | Homo_sapiens.GRCh38.79.gff | annotation GFF file |
89
+ * --fasta_ref* | ref.fa | reference genome fasta file for GATK |
72
90
* --GATK_folder* | GATK | folder with jar file GenomeAnalysisTK.jar |
73
91
* --GATK_bundle* | GATK_bundle | folder with files for BQSR |
74
92
* --intervals* | intervals.bed | bed file with intervals for BQSR |
75
93
* --RG* | PL: ILLUMINA | string to be added to read group information in BAM file |
76
94
* --sjtrim* | false | enable reads trimming at splice junctions |
77
95
* --bqsr* | false | enable base quality score recalibration |
96
+ * --gene_bed* | gene.bed | bed file with genes for RESeQC |
97
+ * --stranded* | no | Strand information for counting with htseq [ no, yes, reverse] |
98
+ * --stranded* | no | Strand information for counting with htseq [ no, yes, reverse] |
99
+ * --hisat2* | false | use hisat2 instead of STAR for mapping |
100
+ * --hisat2_idx* | genome_tran | index filename prefix for hisat2 |
0 commit comments