Introduce LAI to LTR_retriever for measurement of genome assembly continuity
Pre-release
Pre-release
New feature: The LTR-RT Assembly Index (LAI) for evaluation of genome assembly continuity
Description: LTR retrotransposon is very difficult to assemble due to their repetitive nature (up to 75% of a genome, i.e., maize) and long length (up to 20 Kb long). A very simple idea that more intact LTR-RT could be found in the more continuous genome provides the theoretical support of LAI. This module is using the list of intact LTR-RT and the whole-genome annotation of LTR-RT produced by LTR_retriever (*.pass.list and *.out, respectively) for calculation of LAI. A window-based calculation is implemented for estimation of regional continuity. A manuscript describing this feature is in preparation.
Other feathers:
- improved purging criteria. Introduce the identity cutoff for alignment hits (>=30%), change the alignment length criteria to the identity-length criteria: identity-length = alignment length - mismatch >=90 for a real hit.
- add scripts to identify solo LTR and complete LTR, and to estimate solo-complete ratio for each family, and count family size in the genome. These codes were initially developed for this study: https://www.nature.com/articles/s41467-017-02546-5
- Control the length of internal regions (>=100 bp) on LTR candidates.
- Updated the manual