A command line tool for reconstructing pseudogenes in prokaryotic genomes.
# Clone the repository
git clone https://github.com/Floto-Lab/pseudomancer.git
cd pseudomancer
# Create conda environment with all dependencies
mamba env create -f environment.yml
mamba activate pseudomancer_env
Requirements: Mamba or Conda package manager
# Activate environment (if not already active)
mamba activate pseudomancer_env
# Run pipeline
python -m pseudomancer --genus Mycobacterium --genome target_genome.fasta --out_dir results/
--genus
: Genus name for downloading reference proteins from NCBI RefSeq--genome
: Target genome FASTA file to search for pseudogenes--out_dir
: Output directory for results--evalue
: E-value threshold (default: 1e-5)
- Identifies all open reading frames (ORFs) in the target genome using getorf
- Downloads all complete, annotated genomes for the specified genus from NCBI RefSeq
- Extracts and merges protein sequences from all assemblies
- Clusters proteins at 99% identity using mmseqs2 to create a non-redundant dataset
- Searches the clustered proteins against your target genome using mmseqs2 (tblastn-like search)
- Outputs results in tabular format with alignment statistics