Alignment and counting of reads from QuantseqFWD Experiments

Introduction

This Quantseq pipeline converts short read .fastq files from 3' end sequencing into counts tables. The pipeline requires an index of the genome for the START aligner. After trimming to remove polyA sequences and adapters, the reads are aligned to the genome with STAR. Reads aligning to sequences adjacent to probable genomic internal polyA priming sites are removed. Reads are assigned to transcripts in a provided .gtf file and then a table a counts per transcript is generated. This table can be used for statistical analysis using DESeq2 or other statistical analysis software. This pipeline is comprised of shell scripts that can be executed on SLURM using the provided wrapper script. Conda environments are not provided, so please consult with your HPC administrator to run this.

Scripts
Data files in repository
Additional useful files not included
Example 1 - run individual scripts on included test files
Example 2 - run the entire pipeline on a single sample with slurm wrapper script
Expected output of test
Packages used to test pipeline

Table of contents generated with markdown-toc

Scripts

Script	Description
QuantSeqSlurmWrapper.sh	Runs all necessary scripts to align and process quantseq reads with slurm. You must provide a pre-indexed genome for STAR.
TrimReads.sh	Trims reads with bbmap
QuantSeqFilterAndCounts.sh	This will filter reads from a QuantseqFWD experiment that end at a probable genomic poly-A priming site. It will then count the remaining reads over transcript feature coordinates defined in the gtf file.
AlignWithSTAR.sh	Aligns reads to genome with STAR
makeTestSTARIndex.sh	Makes the STAR index for the test data set.
GetInternalPrimingSites.R	This takes a fasta of sequences +/-10bp around putative polyadenylation sites. It checks the genomic sequence in the region [–10..10] surrounding the polyadenylation events and filtered out alignments containing stretches of six consecutive A or with 70% A coverage in any 10-nt sub-window in this region. A bed file of false polyadenylation sites is returned. This is used in QuantSeqFilterAndCounts.sh.

Data files in repository

File	Description
refFiles directory
polyA.fa.gz	fasta with polyA for bbmap trimming
truseq.fa.gz	truseq adapter sequences for bbmap trimming
testFiles directory
test.chrm.lengths	chromosome length for test data set
test.fa	genome sequence for test data set
test.fastq	Quantseq sequencing reads for test data set
test.gtf	transcript feature coordinates for test data set

Additional useful files not included

####note: these are not needed for the test run

Zebrafish gtf file - from https://www.umassmed.edu/globalassets/lawson-lab/downloadfiles/v4.3.2.gtf
Index of Zebrafish genome for STAR - from https://galaxyweb.umassmed.edu/pub/dnext_data/genome_data/zebrafish/GRCz11/refseq_v4.3.2/STARIndex/
Genome sequence in fasta format

Example 1 - run individual scripts on included test files

Create a directory for test output

mkdir test_outputdir

Make test STAR index

mkdir starTestIndex
sbatch --cpus-per-task 6 --mem 48G scripts/makeTestSTARIndex.sh

Trim reads

sbatch scripts/TrimReads.sh \
testFiles/test.fastq \
test_outputdir \
refFiles/polyA.fa.gz,refFiles/truseq.fa.gz

Align reads to genome with STAR

sbatch --cpus-per-task 6 --mem 48G scripts/AlignWithSTAR.sh \
testFiles/test.fastq \
starTestIndex/ \
test_outputdir

Remove probable internal priming reads and count reads over transcripts

sbatch scripts/QuantSeqFilterAndCounts.sh \
test \
testFiles/test.chrm.lengths \
testFiles/test.fa \
testFiles/test.gtf \
test_outputdir

Example 2 - run the entire pipeline on a single sample with slurm wrapper script

Arguments:

Position	Description	Example for test
1	output directory name	test2_outputdir
2	name of fastq file with Quantseq reads	testFiles/test.fastq
3	reference files for trimming	refFiles/polyA.fa.gz,refFiles/truseq.fa.gz
4	name of directory with the genome index for STAR	starTestIndex/
5	chromosome sizes file	testFiles/test.chrm.lengths
6	genome sequence fasta file	testFiles/test.fa
7	gtf file for transcriptome	testFiles/test.gtf

sbatch scripts/QuantSeqSlurmWrapper.sh \
test2_outputdir \
testFiles/test.fastq \
refFiles/polyA.fa.gz,refFiles/truseq.fa.gz \
starTestIndex/ \
testFiles/test.chrm.lengths \
testFiles/test.fa \
testFiles/test.gtf

Expected output of test

File	Description
starTestIndex/	output from makeTestSTARIndex.sh; folder with files of STAR index
test_trimmed.fastq	output from TrimReads.sh; trimmed version of the input fastq - test.fastq
testAligned.sortedByCoord.out.bam	output from AlignWithSTAR.sh; aligned reads
testLog.final.out	output from AlignWithSTAR.sh; see STAR manual
testLog.out	output from AlignWithSTAR.sh; see STAR manual
testLog.progress.out	output from AlignWithSTAR.sh; see STAR manual
testSJ.out.tab	output from AlignWithSTAR.sh; see STAR manual
test_positive_strand.bam	output from QuantSeqFilterAndCounts.sh; reads mapping to positive strand
test_positive_strand.bam.bai	output from QuantSeqFilterAndCounts.sh
test_negative_strand.bam	output from QuantSeqFilterAndCounts.sh; reads mapping to negative strand
test_negative_strand.bam.bai	output from QuantSeqFilterAndCounts.sh
test_positive_strand_flank.bed	coorindates +/- 10 bp around ends of positive strand mapped reads
test_positive_strand_flank.fasta	output from QuantSeqFilterAndCounts.sh; genomic sequences of test_positive_strand_flank.bed
test_negative_strand_flank.bed	output from QuantSeqFilterAndCounts.sh; coorindates +/- 10 bp around starts of positive strand mapped reads
test_negative_strand_flank.fasta	output from QuantSeqFilterAndCounts.sh; genomic sequences of test_negative_strand_flank.bed
test_positive_strand_internalPriming.bed	output from QuantSeqFilterAndCounts.sh and GetInternalPrimingSites.R; coordinates of positive strand read ends with likely genomic polyA internalpriming
test_negative_strand_internalPriming.bed	output from QuantSeqFilterAndCounts.sh and GetInternalPrimingSites.R; ; coordinates of negative strand read starts with likely genomic polyA internal priming
test_positive_strand_internalPrimingReads.txt	output from QuantSeqFilterAndCounts.sh; IDs of reads at likely internal priming sites
test_negative_strand_internalPrimingReads.txt	output from QuantSeqFilterAndCounts.sh; IDs of reads at likely internal priming sites
test_positive_strand_filtered.bam	output from QuantSeqFilterAndCounts.sh; mapped reads after removal of those listed in test_positive_strand_internalPrimingReads.txt
test_negative_strand_filtered.bam	output from QuantSeqFilterAndCounts.sh; mapped reads after removal of those listed in test_negative_strand_internalPrimingReads.txt
test_filtered.bam	output from QuantSeqFilterAndCounts.sh; merged test_positive_strand_filtered.bam and test_negative_strand_filtered.bam
test_filtered.bam.csi	output from QuantSeqFilterAndCounts.sh
test_counts.tab	output from QuantSeqFilterAndCounts.sh; read counts over transcripts; made with htseq-count

Packages used to test pipeline

Package	Version
slurm	20.02
samtools	1.11
bedtools	2.30.0
bedops	2.4.3
picard	2.21.2
python	3.7.0
htseq	0.13.5
bbmap	36.99
star	2.6.1
R	4.0.3
zoo (R package)	1.8-8

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
refFiles		refFiles
scripts		scripts
testFiles		testFiles
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Untitled.png		Untitled.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Alignment and counting of reads from QuantseqFWD Experiments

Introduction

Scripts

Data files in repository

Additional useful files not included

Example 1 - run individual scripts on included test files

Create a directory for test output

Make test STAR index

Trim reads

Align reads to genome with STAR

Remove probable internal priming reads and count reads over transcripts

Example 2 - run the entire pipeline on a single sample with slurm wrapper script

Arguments:

Expected output of test

Packages used to test pipeline

About

Uh oh!

Releases 2

Packages

Languages

License

SansamLab-Pipelines-Genomics/QuantSeqPipeline

Folders and files

Latest commit

History

Repository files navigation

Alignment and counting of reads from QuantseqFWD Experiments

Introduction

Scripts

Data files in repository

Additional useful files not included

Example 1 - run individual scripts on included test files

Create a directory for test output

Make test STAR index

Trim reads

Align reads to genome with STAR

Remove probable internal priming reads and count reads over transcripts

Example 2 - run the entire pipeline on a single sample with slurm wrapper script

Arguments:

Expected output of test

Packages used to test pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages