This update further enhances robustness for large genomes, streamlines overlap computations, and lays the groundwork for more scalable LTR discovery.
Major update
- Reduced RepeatMasker memory footprint with
run_RM_split.pl
:
- Splits large FASTA inputs into manageable chunks for RepeatMasker, masking them in parallel.
- Skips already-masked chunks, then merges results into a single .masked file.
- First run RepeatMasker on the full dataset, but automatically fall back to a chunked masking strategy when the primary call fails or yields no repeats, capping parallel jobs to avoid OOM-kills.
- Rewrote bed_intersect_wao.pl
- Simplified buffering logic: maintain only “active” B intervals in memory, purge by chromosome and start/end comparisons.
- Eliminate circular-lookback logic in favor of a single pass with an in-memory buffer, supporting arbitrary chromosome orders.
- Dynamically detect the number of columns in B to generate the correct “wao” dummy lines when no overlaps are found.
- Speed is comparable to the original
bedtools intersect -wao
Minor update
- Refactored LTR.identifier.pl
Fixed a stray commented guard so that undefined scan entries are now properly skipped #193.