Skip to content

Commit 996ee8e

Browse files
Merge pull request #109 from bnovak32/DEV_Methyl-Seq
Clipping and performance updates added clipping parameters for random-priming library type added gzip parameter to bismark alignment command to reduce intermediate file size
2 parents fa107a0 + 2eb2b69 commit 996ee8e

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

Methyl-Seq/Pipeline_GL-DPPD-7113_Versions/GL-DPPD-7113.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,8 @@ multiqc --interactive \
151151
## 2. Adapter trimming/quality filtering
152152
See `trim_galore --help` or [TrimGalore User Guide](https://github.com/FelixKrueger/TrimGalore/blob/0.6.10/Docs/Trim_Galore_User_Guide.md) for more info on any of the below.
153153

154+
Additionally, the Bismark documentation also includes guidelines for specific MethylSeq library types: [Bismark library type guide](http://felixkrueger.github.io/Bismark/bismark/library_types/). Some library types will require additional 5' and/or 3' hard trimming to remove the signature of the oligos used for random priming. Leaving these bases may cause misalignments and methylation biases.
155+
154156
<br>
155157

156158
### If not RRBS or if RRBS using MseI digestion
@@ -186,6 +188,30 @@ mv sample-1_R2_raw_val_2.fq.gz sample-1_R2_trimmed.fastq.gz
186188

187189
<br>
188190

191+
### If using a random priming post-bisulfite method
192+
(such as TruSeq (formerly EpiGnome), PBAT, scBSSeq, Pico Methyl, Accel, etc.)
193+
Random priming is not truly random and the signature left at the ends of the reads can introduce errors, indels, and methylation biases. Add the optional clipping parameters (`--clip_r1`, `--clip_r2`, `--three_prime_clip_r1`, and `--three_prime_clip_r2`) to trim off the random priming signature on the 5' ends of each read and next to the 3' end after adapter trimming. See [Bismark library type guide](http://felixkrueger.github.io/Bismark/bismark/library_types/) for more detailed information.
194+
195+
**Paired-end example for TruSeq (EpiGnome) library prep**
196+
```bash
197+
trim_galore --gzip \
198+
--cores NumberOfThreads \
199+
--phred33 \
200+
--output_dir trimmed_reads_out_dir/ \
201+
--paired \
202+
--clip_R1 8 \
203+
--clip_R2 8 \
204+
--three_prime_clip_R1 8 \
205+
--three_prime_clip_R2 8 \
206+
sample-1_R1_raw.fastq.gz sample-1_R2_raw.fastq.gz
207+
208+
# renaming outputs to use GeneLab standard suffix
209+
mv sample-1_R1_raw_val_1.fq.gz sample-1_R1_trimmed.fastq.gz
210+
mv sample-1_R2_raw_val_2.fq.gz sample-1_R2_trimmed.fastq.gz
211+
```
212+
213+
<br>
214+
189215
### If RRBS with MspI digestion
190216
Note that if the library preparation was non-directional, the `--non_directional` flag needs to be added to this command (whether single-end or paired-end; see [TrimGalore User Guide](https://github.com/FelixKrueger/TrimGalore/blob/0.6.10/Docs/Trim_Galore_User_Guide.md#rrbs-specific-options-mspi-digested-material)).
191217

@@ -302,6 +328,10 @@ mv sample-1_R2_trimmed.fastq_trimmed.fq.gz sample-1_R2_trimmed.fastq.gz
302328
* `-a2` - specific adapter sequence to be trimmed off of reverse reads (applicable for libraries prepared with the NuGEN ovation kit)
303329
* `--paired` - specifies data are paired-end
304330
* `--output_dir` - the output directory to store results
331+
* `--clip_R1` - number of bases to trim off the 5' end of each R1 read (optional, for use with library prep kits that use random priming, such as TruSeq(EpiGnome))
332+
* `--clip_R2` - number of bases to trim off the 5' end of each R2 read (optional, for use with library prep kits that use random priming, such as TruSeq(EpiGnome))
333+
* `--three_prime_clip_R1` - number of bases to trim off the 3' end of each R1 read AFTER adapter trimming. (optional, for use with library prep kits that use random priming, such as TruSeq(EpiGnome))
334+
* `--three_prime_clip_R2` - number of bases to trim off the 3' end of each R2 read AFTER adapter trimming. (optional, for use with library prep kits that use random priming, such as TruSeq(EpiGnome))
305335
* positional arguments represent the input read files, 2 of them if paired-end data
306336

307337

@@ -459,6 +489,7 @@ bismark --bowtie2 \
459489
--parallel NumberOfThreads \
460490
--non_bs_mm \
461491
--nucleotide_coverage \
492+
--gzip \
462493
--output_dir mapping_files_out_dir/ \
463494
--genome_folder bismark_reference_genome/ \
464495
sample-1_trimmed.fastq.gz
@@ -478,6 +509,7 @@ bismark --bowtie2 \
478509
--parallel NumberOfThreads \
479510
--non_bs_mm \
480511
--nucleotide_coverage \
512+
--gzip \
481513
--output_dir mapping_files_out_dir/ \
482514
--genome_folder bismark_reference_genome/ \
483515
-1 sample-1_R1_trimmed.fastq.gz \
@@ -497,6 +529,7 @@ mv sample-1_R1_trimmed_bismark_bt2_pe.bam sample-1_bismark_bt2_pe.bam
497529
* `--parallel` - allows us to specify the number of threads to use (note: will consume 3-5X this value)
498530
* `--non_bs_mm` - outputs an extra column in the bam file specifying the number of non-bisulfite mismatches each read has
499531
* `--nucleotide_coverage` - outputs a table with mono- and di-nucleotide sequence compositions and coverage values compared to genomic compositions
532+
* `--gzip` - write temporary bisulfite conversion files in gzip format to save disk space during alignment
500533
* `--output_dir` - the output directory to store results
501534
* `--genome_folder` - specifies the directory holding the reference genome indexes (the same that was provided in [Step 4a.](#4a-generate-reference) above)
502535
* input trimmed-reads are provided as a positional argument if they are single-end data

0 commit comments

Comments
 (0)