Skip to content

Releases: FelixKrueger/Bismark

v0.25.1 - tolerate + symbol for UMIs for bclconvert deduplication

06 Aug 15:41
44618d2
Compare
Choose a tag to compare

Allowing the + sign as valid symbol when considering UMIs in --bclconvert mode (more details)

v0.25.0 - new options and minor fixes

02 Aug 19:02
0ad1701
Compare
Choose a tag to compare

Bismark

  • now using 4 cores for merging multiple BAM files (more details #707)

  • fixed a corner case when reads were aligned in FastA mode with --parallel and in addition either --ambiguous and/or --unmapped (see #723)

deduplicate_bismark

  • added check to see if the UMI appears to be in the middle of the readID, e.g. if added by bcl-convert (prompted in #699). Also added new option --bclconvert to use this internal UMI instead of the one at the end. Also allowing the + symbol now for dual-indexed runs

bismark2bedGraph

  • fixed a bug in non-CpG methylation call for CHH context (more details #647)

coverage2cytosine

  • Expanded option --ff into --ffs to extract four, five, and six nucleotide contexts to enable hexamer context analyses. More details here: #717

filter_non_conversion

  • changed shebang line to use env

bismark2report

Version 0.24.2

27 Sep 09:01
acf965c
Compare
Choose a tag to compare

Just a few fixes, also added two flavours of scripts for merging coverage files (e.g. for when R1 and R2 had been run in single-end mode)

Bismark

  • removed an exit 0 that would terminate runs after processing a single (set of) input file(s).

deduplicate_bismark

  • Changed the path to Samtools to custom variable (#609)

coverage2cytosine

  • set threshold reads to 1 (if it was 0) for --gc_context as intended and mentioned in the help text. Fixes #621

monolithic beast no more

29 May 08:48
7288cb6
Compare
Choose a tag to compare
  • Added entirely new documentation website, built using Material for Mkdocs. Thanks to @ewels for a fantastic (late-night) effort to break up and restructure what had become a fairly unwieldy monolithic beast of markdown document...

  • Added docs for cytosine context summary, useful for GpC methylation or filtering for specific C context (e.g. CpA)

  • Updated docs for the dovetailing

Bismark

  • Warning messages for closing ambiguous and unmapped file handles only occur when these options were specified see here

0.24.0 - long read support with minimap2

06 Oct 16:04
430df15
Compare
Choose a tag to compare

Bismark

  • Added new option --strandID which reports the alignment strand identity for paired-end, non-directional libraries, e.g. YS:Z:CTOT. This information may be difficult to obtain if third party tools interfered with the read ordering (admittedly there is a fine balance of read reporting position, FLAG, Read 1 and Genome conversion state to make it work in the first place. More information can be found in this thread).

  • runs with --parallel/--multicore > 1 specified will now terminate with an error message whenever one of the child processes fails. This prevents potentially incomplete result files making it through to the end unnoticed (more #494)

  • runs with --parallel/--multicore > 1 as well as --unmapped and/or --ambiguous specified will no longer produce potentially corrupt FastQ files (more #495)

  • Added option --mm2/--minimap2 to use minimap2 as the underlying aligner. The minimap2 alignment modes include Oxford Nanopore, PacBio and accurate short reads. In its current implementation, minimap2 can be invoked in one of the following ways:

  • --mm2_nanopore: Sets preset settings for Oxford Nanopore vs reference mapping '-x map-ont' [default]

  • --mm2_pacbio: Sets preset settings for PacBio vs. reference mapping '-x map-pb'

  • --mm2_short_reads: Sets preset settings for accurate short reads '-x sr'

  • added option --mm2_maximum_length <int> to set a maximum length cutoff, which might be required for very long reads exceeding the maximum number of CIGAR operations tolerated by the BAM formatted reads (>65535). The default is 10,000 bp.

Other options that are currently set within Bismark include '-a' (SAM output), '--MD' (MD tag), '--secondary=no'.

Prompted by fairly slow alignment speeds with the minimap2 default settings, we set out to improve the performance of the alignment process by tweaking several different parameters

Speed optimisiation: optimisation of minimap2 parameters

k-mer size
Due to the reduced DNA alphabet the minimap2 default k-mer size of 15 leads to substantially higher alignment times. Based on our tests we settled for a new default of ‘-k 20’
minibatch size
The minimap2 default minibatch size of 500 million bp means that a substantial amount of data is aligned and held in memory before additional alignment threads can be started. Reducing the minibatch size to 250K reads seemed to be a good compromise (‘-K 250K’).
minimap2 multi-threading
minimap2 alignments may utilize multiple cores for each alignment process; we found that ‘-t 2’ offered a good speed-up, while allowing additional resources had diminishing returns.
Bismark multi-threading
We also tested the potential of using additional resources for Bismark itself (--parallel), which appeared to result in a speed-up of the alignment process as expected; however this comes at the cost of requiring additional system resources.

As a result of these tests, we changed the default settings for minimap2 alignment parameters to ‘-t 2 -k 20 -K 250K’.

methylation_consistency

  • Added new option --chh to use cytosines in CHH instead of CpG context to enable some trouble shooting and method development

bismark2report

  • The CHH/CHG labels for the Cytosine Methylation after Extraction plot now appear in the correct order

bismark_methylation_extractor

  • removed a print statement that would flood STDOUT the logfile if --merge_non_CG (but not --comprehensive) had been selected

  • runs with --parallel/--multicore specified will now terminate with an error message whenever one of the child processes fails. This prevents potentially incomplete result files making it through to the end unnoticed

  • changed the option -o/--output to -o/--output_dir for consistency reasons...

bismark_genome_preparation

  • Added option --mm2/--minimap2. The genome indexing process (bismark_genome_preparation) writes out a minimap2 index to the genome folder, using the optimized k-mer size of -k 20 (see comments for bismark itself). This pre-generated minimap2 index takes precedence over indexing options that would otherwise happen as part of the alignment procedure.

deduplicate_bismark

  • when using an output filename -o customname the deduplication report will also be derived from customname.

Added a sentence to the Docs that Genozip 14 and above supports Bismark BAM files (with a substantial gain in compression).

fix auto-detection

26 Jul 07:40
Compare
Choose a tag to compare

filter_non_conversion

  • fixed global setting of --paired or --single mode. Auto-detection now works by only looking at the @PG ID:Bismark line of the SAM header.

methylation_consistency

  • Auto-detection now works by only looking at the @PG ID:Bismark line of the SAM header.

coverage2cytosine

  • Swapped the columns for count methylated and count unmethylated for the context summary report to match the header line.

v0.23.0

09 Nov 13:29
Compare
Choose a tag to compare

Bismark Release v0.23.0

  • Migrated CI tests from Travis to Github Actions

deduplicate_bismark

  • the command deduplicate_bismark --barcode *bam now works again. Previously the output file names were accidentally all derived from the first supplied file in --barcode (= UMI) mode (it had been fixed for normal files in 0.22.2).

  • Changed the way the library auto-detection works to only look at the @PG ID:Bismark line of the SAM header (to only look for the Bismark command)

bismark_methylation_extractor / bismark2bedGraph

  • Added a new option --ucsc to bismark2bedGraph and bismark_methylation_extractor that will produce a UCSC-ready bedGraph file if the genome version used came from Ensembl. This option (i) prefixes chromosome names with 'chr', and (ii) changes the mitochondrial chromosome from 'MT' to 'chrM'. In addition, it will also write out a new file ending in .chromosome_sizes.txt for easier use of bedGraphToBigWig. More here.

  • Changed the way the library auto-detection works to only look at the @PG ID:Bismark line of the SAM header.

coverage2cytosine

  • Added a new output file for all cytosine context methylation totals. More information here: #321.

  • Added new option --drach/--m6A. Most m6A sites are found in the conserved sequence motif DRACH (where D=G/A/U, R=G/A, H=A/U/C), and if bound by anti-m6A antibody, it causes the reverse transcriptase to introduce C to T transitions at the cytosine which follows A in the DRACH motif. This option also sets a coverage threshold of at 1 unless specified explicitly. This is a very specialised option and should only be used by experimentalists looking at m6A methylation (where the C to T transition acts as a proxy of m6A).

bismark2summary

  • Samples with absolutely 0 methylation calls in some context are now excluded from the graphical HTML output (as they break rendering the entire summary graph section). These samples and their statistics do still appear in the file bismark_summary_report.txt. More information here: #315.

v0.22.3

19 Nov 11:35
Compare
Choose a tag to compare

Bismark

  • Accepted pull request to fix the MAPQ score calculation in local mode.

methylation_consistency

0.22.2

16 Oct 14:41
f960b3a
Compare
Choose a tag to compare
  • Added FAQ document for questions that keep coming up. Will be populated over time.

Bismark

  • the option --non_bs_mm is now only allowed in end-to-end mode

  • Fixed the calculation of non bisulfite mismatches for paired-end data which happened correctly only when R2 had an InDel (see here)

  • When the option -u was used in conjunction with --parallel, only -u sequences will be written to the temporary subset files for each spawn of Bismark (previously, the entire file was split for --parallel, but then only a small subset of those files was used for -u, which resulted in very long runs even for a small number of analysed sequences)

deduplicate_bismark

  • the command deduplicate_bismark *bam now works again. Previously the output file names were accidentally all derived from the first supplied file.

coverage2cytosine

  • Added new option --coverage_threshold INT. Positions have to be covered by at least INT calls (irrespective of their methylation state) before they get reported. For NOMe-seq, the minimum threshold is automatically set to 1 unless specified explicitly. Setting a coverage threshold does not work in conjunction with --merge_CpGs (as all genomix CpGs are required for this). Default: 0 (i.e. all genomic positions get reported)

bismark2report

  • added seconds to the timestamp report statement (which caused a warning on certain, but not all, platforms)

bismark2summary

  • Now reads splitting reports even for non-deduplicated files (such as RRBS).

Essential Easter Performance Release [EEPR]

21 Apr 15:52
Compare
Choose a tag to compare

Bismark

  • Hot-fixed (read: removed) the cause of delay during the MD:Z: field computation for reads containing a deletion (which was roughly equal to 1 second per read). Apologies, I did it again...

  • Changed the default --score_min function for HISAT2 in --local mode back to a linear function (instead of using the logarithmic model that is employed by Bowtie 2). The default is now --score_min L,0,-0.2 for both end-to-end (default) and --local mode. It should be mentioned that we currently don't understand how exactly the scoring mode in HISAT2 works (even though the scores appear to be all negative with a maximum value of 0), so this might change somewhat in the future. See here for more info.