Skip to content

Commit ae889d3

Browse files
authored
Merge pull request #5 from IARCbioinfo/dev
Dev
2 parents ddfddbb + 87a1b64 commit ae889d3

File tree

4 files changed

+11479
-114
lines changed

4 files changed

+11479
-114
lines changed

README.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ The following programs need to be installed and in the PATH environment variable
1111
- [*fastqc*](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/INSTALL.txt)
1212
- [*cutadapt*](http://cutadapt.readthedocs.io/en/stable/installation.html), which requires Python version > 2.7
1313
- [*trim_galore*](https://github.com/FelixKrueger/TrimGalore)
14+
- [*RESeQC*](http://rseqc.sourceforge.net/)
1415
- [*multiQC*](http://multiqc.info/docs/)
1516
- [*STAR*](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf)
1617
- [*htseq*](http://www-huber.embl.de/HTSeq/doc/install.html#install); the python script htseq-count must also be in the PATH
@@ -21,6 +22,17 @@ In addition, STAR requires genome indices that can be generated from a genome fa
2122
STAR --runThreadN n --runMode genomeGenerate --genomeDir ref --genomeFastaFiles ref.fa --sjdbGTFfile ref.gtf --sjdbOverhang 99
2223
```
2324

25+
### Prerequisites for alignment with hisat2
26+
In order to perform the optional alignment with hisat2, hisat2 must be installed:
27+
- [*hisat2*](https://ccb.jhu.edu/software/hisat2/index.shtml)
28+
29+
In addition, indexes files *.ht2* must be downloaded from generated from [*hisat2*](https://ccb.jhu.edu/software/hisat2/index.shtml), or generated from a reference fasta file (e.g., reference.fa) and a GTF annotation file (e.g., reference.gtf) using the following commands:
30+
```bash
31+
extract_splice_sites.py reference.gtf > genome.ss
32+
extract_exons.py reference.gtf > genome.exon
33+
hisat2-build reference.fa --ss genome.ss --exon genome.exon genome_tran
34+
```
35+
2436
### Prerequisites for reads trimming at splice junctions
2537
In order to perform the optional reads trimming at splice junctions, GATK must be installed:
2638
- GATK [*GenomeAnalysisTK.jar*](https://software.broadinstitute.org/gatk/guide/quickstart)
@@ -40,6 +52,11 @@ java -jar picard.jar CreateSequenceDictionary R= ref.fa O= ref.dict
4052
To run the pipeline on a series of paired-end fastq files (with suffixes *_1* and *_2*) in folder *fastq*, and a reference genome with indexes in folder *ref_genome*, one can type:
4153
```bash
4254
nextflow run iarcbioinfo/RNAseq-nf --input_folder fastq --gendir ref_genome --suffix1 _1 --suffix2 _2
55+
```
56+
### Use hisat2 for mapping
57+
To use the reads trimming at splice junctions step, you must add the ***--hisat2* option**, specify the path to the folder containing the hisat2 index files, as well as satisfy the requirements above mentionned. For example:
58+
```bash
59+
nextflow run iarcbioinfo/RNAseq-nf --input_folder fastq --suffix1 _1 --suffix2 _2 --hisat2 --hisat2_idx /home/user/reference/genome_tran
4360
```
4461
### Enable reads trimming at splice junctions
4562
To use the reads trimming at splice junctions step, you must add the ***--sjtrim* option**, specify the path to the folder containing the GenomeAnalysisTK jar file, as well as satisfy the requirements above mentionned. For example:
@@ -60,18 +77,24 @@ nextflow run iarcbioinfo/RNAseq-nf --input_folder fastq --gendir ref_genome --su
6077
*--input_folder* | . | input folder |
6178
*--output_folder* | . | output folder |
6279
*--gendir* | ref | reference genome folder |
63-
*--cpu* | 8 | number of CPUs |
64-
*--mem* | 32 | memory|
80+
*--cpu* | 4 | number of CPUs |
81+
*--mem* | 50 | memory for mapping|
82+
*--memOther* | 2 | memory for QC and counting|
6583
*--fastq_ext* | fq.gz | extension of fastq files|
6684
*--suffix1* | \_1 | suffix for second element of read files pair|
6785
*--suffix2* | \_2 | suffix for second element of read files pair|
6886
*--output_folder* | . | output folder for aligned BAMs|
69-
*--fasta_ref* | ref.fa | reference genome fasta file for GATK |
7087
*--annot_gtf* | Homo_sapiens.GRCh38.79.gtf | annotation GTF file |
7188
*--annot_gff* | Homo_sapiens.GRCh38.79.gff | annotation GFF file |
89+
*--fasta_ref* | ref.fa | reference genome fasta file for GATK |
7290
*--GATK_folder* | GATK | folder with jar file GenomeAnalysisTK.jar |
7391
*--GATK_bundle* | GATK_bundle | folder with files for BQSR |
7492
*--intervals* | intervals.bed | bed file with intervals for BQSR |
7593
*--RG* | PL:ILLUMINA | string to be added to read group information in BAM file |
7694
*--sjtrim* | false | enable reads trimming at splice junctions |
7795
*--bqsr* | false | enable base quality score recalibration |
96+
*--gene_bed* | gene.bed | bed file with genes for RESeQC |
97+
*--stranded* | no | Strand information for counting with htseq [no, yes, reverse] |
98+
*--stranded* | no | Strand information for counting with htseq [no, yes, reverse] |
99+
*--hisat2* | false | use hisat2 instead of STAR for mapping |
100+
*--hisat2_idx* | genome_tran | index filename prefix for hisat2 |

0 commit comments

Comments
 (0)