06 Aug 15:41

FelixKrueger

44618d2

v0.25.1 - tolerate + symbol for UMIs for bclconvert deduplication Latest

Latest

Allowing the + sign as valid symbol when considering UMIs in --bclconvert mode (more details)

Assets 2

02 Aug 19:02

FelixKrueger

v0.25.0

0ad1701

v0.25.0 - new options and minor fixes

Bismark

now using 4 cores for merging multiple BAM files (more details #707)
fixed a corner case when reads were aligned in FastA mode with --parallel and in addition either --ambiguous and/or --unmapped (see #723)

deduplicate_bismark

added check to see if the UMI appears to be in the middle of the readID, e.g. if added by bcl-convert (prompted in #699). Also added new option --bclconvert to use this internal UMI instead of the one at the end. Also allowing the + symbol now for dual-indexed runs

bismark2bedGraph

fixed a bug in non-CpG methylation call for CHH context (more details #647)

coverage2cytosine

Expanded option --ff into --ffs to extract four, five, and six nucleotide contexts to enable hexamer context analyses. More details here: #717

filter_non_conversion

changed shebang line to use env

bismark2report

better handling of division by 0 error see more here

Assets 2

27 Sep 09:01

FelixKrueger

v0.24.2

acf965c

Version 0.24.2

Just a few fixes, also added two flavours of scripts for merging coverage files (e.g. for when R1 and R2 had been run in single-end mode)

Bismark

removed an exit 0 that would terminate runs after processing a single (set of) input file(s).

deduplicate_bismark

Changed the path to Samtools to custom variable (#609)

coverage2cytosine

set threshold reads to 1 (if it was 0) for --gc_context as intended and mentioned in the help text. Fixes #621

Assets 2

29 May 08:48

FelixKrueger

0.24.1

7288cb6

monolithic beast no more

Added entirely new documentation website, built using Material for Mkdocs. Thanks to @ewels for a fantastic (late-night) effort to break up and restructure what had become a fairly unwieldy monolithic beast of markdown document...
Added docs for cytosine context summary, useful for GpC methylation or filtering for specific C context (e.g. CpA)
Updated docs for the dovetailing

Bismark

Warning messages for closing ambiguous and unmapped file handles only occur when these options were specified see here

Contributors

ewels

Assets 2

06 Oct 16:04

FelixKrueger

0.24.0

430df15

0.24.0 - long read support with minimap2

Bismark

Added new option --strandID which reports the alignment strand identity for paired-end, non-directional libraries, e.g. YS:Z:CTOT. This information may be difficult to obtain if third party tools interfered with the read ordering (admittedly there is a fine balance of read reporting position, FLAG, Read 1 and Genome conversion state to make it work in the first place. More information can be found in this thread).
runs with --parallel/--multicore > 1 specified will now terminate with an error message whenever one of the child processes fails. This prevents potentially incomplete result files making it through to the end unnoticed (more #494)
runs with --parallel/--multicore > 1 as well as --unmapped and/or --ambiguous specified will no longer produce potentially corrupt FastQ files (more #495)
Added option --mm2/--minimap2 to use minimap2 as the underlying aligner. The minimap2 alignment modes include Oxford Nanopore, PacBio and accurate short reads. In its current implementation, minimap2 can be invoked in one of the following ways:
--mm2_nanopore: Sets preset settings for Oxford Nanopore vs reference mapping '-x map-ont' [default]
--mm2_pacbio: Sets preset settings for PacBio vs. reference mapping '-x map-pb'
--mm2_short_reads: Sets preset settings for accurate short reads '-x sr'
added option --mm2_maximum_length <int> to set a maximum length cutoff, which might be required for very long reads exceeding the maximum number of CIGAR operations tolerated by the BAM formatted reads (>65535). The default is 10,000 bp.

Other options that are currently set within Bismark include '-a' (SAM output), '--MD' (MD tag), '--secondary=no'.

Prompted by fairly slow alignment speeds with the minimap2 default settings, we set out to improve the performance of the alignment process by tweaking several different parameters

Speed optimisiation: optimisation of minimap2 parameters

k-mer size
Due to the reduced DNA alphabet the minimap2 default k-mer size of 15 leads to substantially higher alignment times. Based on our tests we settled for a new default of ‘-k 20’
minibatch size
The minimap2 default minibatch size of 500 million bp means that a substantial amount of data is aligned and held in memory before additional alignment threads can be started. Reducing the minibatch size to 250K reads seemed to be a good compromise (‘-K 250K’).
minimap2 multi-threading
minimap2 alignments may utilize multiple cores for each alignment process; we found that ‘-t 2’ offered a good speed-up, while allowing additional resources had diminishing returns.
Bismark multi-threading
We also tested the potential of using additional resources for Bismark itself (--parallel), which appeared to result in a speed-up of the alignment process as expected; however this comes at the cost of requiring additional system resources.

As a result of these tests, we changed the default settings for minimap2 alignment parameters to ‘-t 2 -k 20 -K 250K’.

methylation_consistency

Added new option --chh to use cytosines in CHH instead of CpG context to enable some trouble shooting and method development

bismark2report

The CHH/CHG labels for the Cytosine Methylation after Extraction plot now appear in the correct order

bismark_methylation_extractor

removed a print statement that would flood STDOUT the logfile if --merge_non_CG (but not --comprehensive) had been selected
runs with --parallel/--multicore specified will now terminate with an error message whenever one of the child processes fails. This prevents potentially incomplete result files making it through to the end unnoticed
changed the option -o/--output to -o/--output_dir for consistency reasons...

bismark_genome_preparation

Added option --mm2/--minimap2. The genome indexing process (bismark_genome_preparation) writes out a minimap2 index to the genome folder, using the optimized k-mer size of -k 20 (see comments for bismark itself). This pre-generated minimap2 index takes precedence over indexing options that would otherwise happen as part of the alignment procedure.

deduplicate_bismark

when using an output filename -o customname the deduplication report will also be derived from customname.

Added a sentence to the Docs that Genozip 14 and above supports Bismark BAM files (with a substantial gain in compression).

Assets 2

26 Jul 07:40

FelixKrueger

0.23.1

73844f4

fix auto-detection

filter_non_conversion

fixed global setting of --paired or --single mode. Auto-detection now works by only looking at the @PG ID:Bismark line of the SAM header.

methylation_consistency

Auto-detection now works by only looking at the @PG ID:Bismark line of the SAM header.

coverage2cytosine

Swapped the columns for count methylated and count unmethylated for the context summary report to match the header line.

Assets 2

09 Nov 13:29

FelixKrueger

0.23.0

87df331

v0.23.0

Bismark Release v0.23.0

Migrated CI tests from Travis to Github Actions

deduplicate_bismark

the command deduplicate_bismark --barcode *bam now works again. Previously the output file names were accidentally all derived from the first supplied file in --barcode (= UMI) mode (it had been fixed for normal files in 0.22.2).
Changed the way the library auto-detection works to only look at the @PG ID:Bismark line of the SAM header (to only look for the Bismark command)

bismark_methylation_extractor / bismark2bedGraph

Added a new option --ucsc to bismark2bedGraph and bismark_methylation_extractor that will produce a UCSC-ready bedGraph file if the genome version used came from Ensembl. This option (i) prefixes chromosome names with 'chr', and (ii) changes the mitochondrial chromosome from 'MT' to 'chrM'. In addition, it will also write out a new file ending in .chromosome_sizes.txt for easier use of bedGraphToBigWig. More here.
Changed the way the library auto-detection works to only look at the @PG ID:Bismark line of the SAM header.

coverage2cytosine

Added a new output file for all cytosine context methylation totals. More information here: #321.
Added new option --drach/--m6A. Most m6A sites are found in the conserved sequence motif DRACH (where D=G/A/U, R=G/A, H=A/U/C), and if bound by anti-m6A antibody, it causes the reverse transcriptase to introduce C to T transitions at the cytosine which follows A in the DRACH motif. This option also sets a coverage threshold of at 1 unless specified explicitly. This is a very specialised option and should only be used by experimentalists looking at m6A methylation (where the C to T transition acts as a proxy of m6A).

bismark2summary

Samples with absolutely 0 methylation calls in some context are now excluded from the graphical HTML output (as they break rendering the entire summary graph section). These samples and their statistics do still appear in the file bismark_summary_report.txt. More information here: #315.

Assets 2

19 Nov 11:35

FelixKrueger

0.22.3

ee6952a

v0.22.3

Bismark

Accepted pull request to fix the MAPQ score calculation in local mode.

methylation_consistency

Added a new script to assess the concordance of methylation calls. See more here: https://github.com/FelixKrueger/Bismark/tree/master/Docs#x-concordance-of-methylation-calls-across-bisulfite-reads

Assets 2

16 Oct 14:41

FelixKrueger

0.22.2

f960b3a

0.22.2

Added FAQ document for questions that keep coming up. Will be populated over time.

Bismark

the option --non_bs_mm is now only allowed in end-to-end mode
Fixed the calculation of non bisulfite mismatches for paired-end data which happened correctly only when R2 had an InDel (see here)
When the option -u was used in conjunction with --parallel, only -u sequences will be written to the temporary subset files for each spawn of Bismark (previously, the entire file was split for --parallel, but then only a small subset of those files was used for -u, which resulted in very long runs even for a small number of analysed sequences)

deduplicate_bismark

the command deduplicate_bismark *bam now works again. Previously the output file names were accidentally all derived from the first supplied file.

coverage2cytosine

Added new option --coverage_threshold INT. Positions have to be covered by at least INT calls (irrespective of their methylation state) before they get reported. For NOMe-seq, the minimum threshold is automatically set to 1 unless specified explicitly. Setting a coverage threshold does not work in conjunction with --merge_CpGs (as all genomix CpGs are required for this). Default: 0 (i.e. all genomic positions get reported)

bismark2report

added seconds to the timestamp report statement (which caused a warning on certain, but not all, platforms)

bismark2summary

Now reads splitting reports even for non-deduplicated files (such as RRBS).

Assets 2

21 Apr 15:52

FelixKrueger

0.22.1

6ab9539

Essential Easter Performance Release [EEPR]

Bismark

Hot-fixed (read: removed) the cause of delay during the MD:Z: field computation for reads containing a deletion (which was roughly equal to 1 second per read). Apologies, I did it again...
Changed the default --score_min function for HISAT2 in --local mode back to a linear function (instead of using the logarithmic model that is employed by Bowtie 2). The default is now --score_min L,0,-0.2 for both end-to-end (default) and --local mode. It should be mentioned that we currently don't understand how exactly the scoring mode in HISAT2 works (even though the scores appear to be all negative with a maximum value of 0), so this might change somewhat in the future. See here for more info.

Assets 2

Releases: FelixKrueger/Bismark

v0.25.1 - tolerate + symbol for UMIs for bclconvert deduplication

Uh oh!

v0.25.0 - new options and minor fixes

Bismark

deduplicate_bismark

bismark2bedGraph

coverage2cytosine

filter_non_conversion

bismark2report

Uh oh!

Version 0.24.2

Bismark

deduplicate_bismark

coverage2cytosine

Uh oh!

monolithic beast no more

Bismark

Contributors

Uh oh!

0.24.0 - long read support with minimap2

Bismark

methylation_consistency

bismark2report

bismark_methylation_extractor

bismark_genome_preparation

deduplicate_bismark

Uh oh!

fix auto-detection

filter_non_conversion

methylation_consistency

coverage2cytosine

Uh oh!

v0.23.0

Bismark Release v0.23.0

deduplicate_bismark

bismark_methylation_extractor / bismark2bedGraph

coverage2cytosine

bismark2summary

Uh oh!

v0.22.3

Bismark

methylation_consistency

Uh oh!

0.22.2

Bismark

deduplicate_bismark

coverage2cytosine

bismark2report

bismark2summary

Uh oh!

Essential Easter Performance Release [EEPR]

Bismark

Uh oh!