efmcalculator2 is a Python package or web tool for detecting mutational hotspots. It predicts the mutation rates associated with each hotspot and combines them into a relative instability score. These hotspots include simple sequence repeats, repeat mediated deletions, and short repeat sequences. This code updates and improves upon the last version of the EFM calculator.
efmcalculator2 supports multifasta, genbank, or csv files as input and accepts parameters from the command line. It also supports the scanning of both linear and circular sequences. It defaults to a pairwise comparison strategy (all occurrences of a repeat are compared with all other occurrences), but it also contains an option for a linear comparison strategy (each occurrence of a repeat is only compared with the next occurrence in the sequence) to accelerate the analysis of large sequences.
The EFM Calculator can be accessed as a free web tool at efm2-beta.streamlit.app. It is limited to 50000 bases to ensure the app remains performant for other users. It can be installed and run locally below without such base restriction.
pip install efmcalculator2 or clone this repository and pip install ./ from the root of the repository.
- -h: help
- -i: inpath
- -o: outpath
- -s: strategy. Either “linear” or “pairwise”
- -c: circular inputs
- -f: output filetype for tables, either csv or parquet
- -j: threads
- -t: tall. Parallelizes across inputs rather than within.
- -v: verbose. 0 (silent), 1 (basic information), 2 (debug)
- --summary: saves only aggrigate results, useful for very tall inputs
Print efmcalculator2 help:
efmcalculator2 -h
Run efmcalculator2 on all sequences in a FASTA file using the pairwise strategy and print output to csv files within an output folder:
efmcalculator2 -i “input.fasta” -o “output_folder”
Run efmcalculator2 on all sequences in a FASTA file, outputing to the folder output_folder, while treating the input as circular, searching with a linear pattern, and printing debug information:
efmcalculator2 -i “input.fasta” -o “output_folder” -c -s “linear” -v 2