Indel detection software designed to be used on MiXCR outputs
Inputs are two files: Assembled-annotation and Proteomic-database
The pipeline is outlined as follows:
- Creates directories output/fastas, output/mmseqs, output/igblast, where "output" is a user defined directory.
- Pull cluster rep sequences from Proteomic-database and write to FASTA file.
- Pull sequences from Assembled-annotation and write to FASTA file.
- Create MMseqs database for the Assembled-annotation FASTA.
- Create MMseqs database for the Proteomic-database FASTA.
- Search Assembled-annotation database against Proteomic-database. Write alignments to m8 file.
- Load m8 file and original Proteomic-databases. For each cluster, compute the number of peaks along the sequence length axis. Flag any clusters of a certain cluster size threshold that had an increase in the number of peaks. Open the Assembled-annotation file. Write any sequence from the flagged clusters to an abbreviated Assembled-annotation file and FASTA file.
- Run IgBLAST on the abbreviated Assembled-annotation FASTA file.
- Run insertion_finder.py on IgBLAST MSAs.