Skip to content

1 QuickStart

Sophie edited this page Apr 15, 2025 · 6 revisions

Install requirements

  • Nextflow
  • Singularity/Apptainer or Docker
  • Sufficient amount of scratch space and RAM (300 Sequences of 400 residues with 30% sequence identity need 30GB disk space and 32GB RAM)
  • Copy of this repository
    git clone https://github.com/Bio2Byte/simsapiper.git
    

Prepare data

Use directory toy_example to test installation. SIMSAPiper will automatically recognize directories called data if none is specified. The directory contains:

  • Subdirectory seqs with fasta-formatted protein sequences
  • Optional: subdirectory structures with 3D protein structure models

Launch pipeline using command line

Enable recommended settings using --magic

nextflow run simsapiper.nf -profile server,withsingularity --data $PWD/toy_example/data --magic

or use

./magic_align.sh

This file can also be double-clicked to run the toy_example dataset.

Use absolute files paths (/Users/me/workspace/simsapiper/toy_example/data).

By default most flags are set to False. Adding a flag to the command line will set it to True and activate it. Some flags can carry additional information, such as percentages or filenames. The complete list can be found below.

--magic flag is equivalent to

nextflow run simsapiper.nf 
    -profile server,withsingularity 
    --seqFormat fasta
    --seqQC 5
    --dropSimilar 90
    --outFolder $PWD/simsa_time_of_execution
    --outName "magicMsa"
    --minSubsetID "min"
    --createSubsets 30
    --retrieve
    --model
    --strucQC 5
    --dssp
    --squeeze "H,E"
    --squeezePerc 80
    --reorder
    --data $PWD/toy_example/data

Other presets:

--minimagic to align small datasets (<50 sequences)

--localmagic to align datasets with predicting 3D structures locally using ESMfold

Clone this wiki locally