You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
6.[*htseq*](http://www-huber.embl.de/HTSeq/doc/install.html#install); the python script htseq-count must also be in the PATH
22
22
23
+
**A singularity container is available with all the tools needed to run the pipeline (see "Usage")**
24
+
25
+
### References
23
26
A bundle with reference genome and corresponding annotations for STAR is available at https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/.
24
27
25
28
Alternatively, STAR genome indices can be generated from a genome fasta file ref.fa and a splice junction annotation file ref.gtf using the following command:
@@ -66,7 +69,7 @@ In order to perform the optional base quality score recalibration, several files
66
69
|--input_folder | a folder with fastq files or bam files |
67
70
|--input_file | input tabulation-separated values file with columns SM (sample name), RG (read group), pair1 (first fastq pair file), and pair2 (second fastq pair file) |
68
71
69
-
Note that there are two input methods--folder and file. Although the input folder method is the easiest because it does not require to create an input file with the right format, the input file mode is recommended in cases when a single sample has multiple paired files (e.g., due to multiplexed sequencing); in that case, users should have one line per pair of file and put a same SM identifier so that the workflow can group them into the same output bam file.
72
+
Note that there are two input methods: folder and file. Although the input folder method is the easiest because it does not require to create an input file with the right format, the input file mode is recommended in cases when a single sample has multiple paired files (e.g., due to multiplexed sequencing); in that case, users should have one line per pair of file and put a same SM identifier so that the workflow can group them into the same output bam file.
70
73
71
74
72
75
## Parameters
@@ -95,9 +98,11 @@ In order to perform the optional base quality score recalibration, several files
95
98
|--ref | ref.fa | reference genome fasta file for GATK |
96
99
|--snp_vcf | dbsnp.vcf | VCF file with known variants for GATK BQSR |
97
100
|--indel_vcf | Mills_100G_indels.vcf | VCF file with known indels for GATK BQSR |
101
+
|--STAR_mapqUnique | 255 | STAR default mapping quality for unique mappers |
98
102
|--RG | PL:ILLUMINA| string to be added to read group information in BAM file |
99
103
|--stranded | no | Strand information for counting with htseq [no, yes, reverse]|
100
104
|--hisat2_idx | genome_tran | index filename prefix for hisat2 |
105
+
|--htseq_maxreads | 30000000 | Maximum number of reads taken into account by htseq-count |
101
106
|--multiqc_config | null | config yaml file for multiqc |
102
107
103
108
@@ -115,8 +120,10 @@ In order to perform the optional base quality score recalibration, several files
115
120
## Usage
116
121
To run the pipeline on a series of paired-end fastq files (with suffixes *_1* and *_2*) in folder *fastq*, a reference genome with indexes in folder *ref_genome*, an annotation file ref.gtf, and a bed file ref.bed, one can type:
To run the pipeline without singularity just remove "-profile singularity"
126
+
120
127
### Use hisat2 for mapping
121
128
To use hisat2 instead of STAR for the reads mapping, you must add the ***--hisat2* option**, specify the path to the folder containing the hisat2 index files (genome_tran.1.ht2 to genome_tran.8.ht2), as well as satisfy the requirements above mentionned. For example:
if ( !(keys1.containsAll(keys2)) ||!(keys2.containsAll(keys1)) ) {println"\n ERROR : There is not at least one fastq without its mate, please check your fastq files."; System.exit(0)}
0 commit comments