Skip to content

v3.0.4

Latest
Compare
Choose a tag to compare
@oushujun oushujun released this 25 Jun 19:20

This update further enhances robustness for large genomes, streamlines overlap computations, and lays the groundwork for more scalable LTR discovery.

Major update

  1. Reduced RepeatMasker memory footprint with run_RM_split.pl:
  • Splits large FASTA inputs into manageable chunks for RepeatMasker, masking them in parallel.
  • Skips already-masked chunks, then merges results into a single .masked file.
  • First run RepeatMasker on the full dataset, but automatically fall back to a chunked masking strategy when the primary call fails or yields no repeats, capping parallel jobs to avoid OOM-kills.
  1. Rewrote bed_intersect_wao.pl
  • Simplified buffering logic: maintain only “active” B intervals in memory, purge by chromosome and start/end comparisons.
  • Eliminate circular-lookback logic in favor of a single pass with an in-memory buffer, supporting arbitrary chromosome orders.
  • Dynamically detect the number of columns in B to generate the correct “wao” dummy lines when no overlaps are found.
  • Speed is comparable to the original bedtools intersect -wao

Minor update

  1. Refactored LTR.identifier.pl
    Fixed a stray commented guard so that undefined scan entries are now properly skipped #193.