You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
8.[*htseq*](http://www-huber.embl.de/HTSeq/doc/install.html#install); the python script htseq-count must also be in the PATH
19
22
20
23
In addition, STAR requires genome indices that can be generated from a genome fasta file ref.fa and a splice junction annotation file ref.gtf using the following command:
21
24
```bash
22
25
STAR --runThreadN n --runMode genomeGenerate --genomeDir ref --genomeFastaFiles ref.fa --sjdbGTFfile ref.gtf --sjdbOverhang 99
23
26
```
24
27
25
-
### Prerequisites for alignment with hisat2
28
+
### Alignment with hisat2
26
29
In order to perform the optional alignment with hisat2, hisat2 must be installed:
-[GATK bundle](https://software.broadinstitute.org/gatk/download/bundle) VCF files with lists of indels and SNVs (recommended: 1000 genomes indels, Mills gold standard indels VCFs, dbsnp VCF)
49
53
- bed file with intervals to be considered
50
54
55
+
### Clustering
56
+
In order to perform the optional unsupervised analysis of read counts (PCA and consensus clustering), you need:
57
+
- the unsupervised analysis R script [*RNAseq_unsupervised.R*](https://github.com/IARCbioinfo/RNAseq_analysis_scripts); this script must be in a floder of the path variable (e.g., in /usr/bin/)
58
+
-[R and Rscript](https://cran.r-project.org) with packages ConsensusClusterPlus, ade4, DESeq2, fpc, and cluster
59
+
60
+
## Input
61
+
| Type | Description |
62
+
|-----------|---------------|
63
+
| --input_folder | a folder with fastq files or bam files |
|--clustering | perform unsupervised analyses of read counts data|
109
+
110
+
51
111
## Usage
52
-
To run the pipeline on a series of paired-end fastq files (with suffixes *_1* and *_2*) in folder *fastq*, and a reference genome with indexes in folder *ref_genome*, one can type:
112
+
To run the pipeline on a series of paired-end fastq files (with suffixes *_1* and *_2*) in folder *fastq*, a reference genome with indexes in folder *ref_genome*, an annotation file ref.gtf, and a bed file ref.bed, one can type:
To use the reads trimming at splice junctions step, you must add the ***--hisat2* option**, specify the path to the folder containing the hisat2 index files, as well as satisfy the requirements above mentionned. For example:
117
+
To use hisat2 instead of STAR for the reads mapping, you must add the ***--hisat2* option**, specify the path to the folder containing the hisat2 index files (genome_tran.1.ht2 to genome_tran.8.ht2), as well as satisfy the requirements above mentionned. For example:
Note that parameter '--hisat2_idx' is the prefix of the index files, not the entire path to .ht2 files.
122
+
61
123
### Enable reads trimming at splice junctions
62
124
To use the reads trimming at splice junctions step, you must add the ***--sjtrim* option**, specify the path to the folder containing the GenomeAnalysisTK jar file, as well as satisfy the requirements above mentionned. For example:
To use the base quality score recalibration step, you must add the ***--bqsr* option**, specify the path to the folder containing the GenomeAnalysisTK jar file, the path to the GATK bundle folder for your reference genome, specify the path to the bed file with intervals to be considered, as well as satisfy the requirements above mentionned. For example:
You can also specify options n, t, c, and l (see [*RNAseq_unsupervised.R*](https://github.com/IARCbioinfo/RNAseq_analysis_scripts)) of script RNAseq_unsupervised.R using options '--clustering_n', '--clustering_t', '--clustering_c', and '--clustering_l'.
141
+
142
+
143
+
## Output
144
+
| Type | Description |
145
+
|-----------|---------------|
146
+
| file.bam | BAM files of alignments or realignments |
147
+
| file.bam.bai | BAI files of alignments or realignments |
0 commit comments