nf-binder-design

Nextflow pipelines for de novo protein binder design.

RFDiffusion -> ProteinMPNN -> AlphaFold2 initial guess
RFDiffusion Partial Diffusion
BindCraft (in parallel across multiple GPUs)
"Boltz Pulldown" (an AlphaPulldown-like protocol using Boltz-2)

Note well: Components of these workflows use RFDiffusion and BindCraft, which depend on PyRosetta/Rosetta, which is free for non-commercial use, however commercial use requires a paid license agreement with University of Washington: https://github.com/RosettaCommons/rosetta/blob/main/LICENSE.md and https://rosettacommons.org/software/licensing-faq/

Setup

Install Nextflow.

Clone the git repository:

git clone https://github.com/Australian-Protein-Design-Initiative/nf-binder-design

Binder design with RFDiffusion

Example:

OUTDIR=results
mkdir -p $OUTDIR/logs

nextflow run main.nf \
    --input_pdb target.pdb \
    --outdir $OUTDIR \
    --contigs "[A371-508/A753-883/A946-1118/A1135-1153/0 70-100]" \
    --hotspot_res "A473,A995,A411,A421" \
    --rfd_n_designs=10 \
    --rfdiffusion_batch_size 1 \
    -with-report $OUTDIR/logs/report_$(date +%Y%m%d_%H%M%S).html \
    -with-trace $OUTDIR/logs/trace_$(date +%Y%m%d_%H%M%S).txt \
    -resume \
    -profile local

If you are working on a specific cluster like M3 or MLeRP, you should omit -profile local and add the -c flag pointing to the specific platform config, eg -c conf/platforms/m3.config for M3.

A more complex example, as a wrapper script for the M3 HPC cluster, using a the site-specific config (-c), a specific RFDiffusion model (--rfd_model_path), a radius of gyration filter on the generated RFDiffusion backbones (--rfd_filters), custom ProteinMPNN weights (--pmpnn_weigths) and radius of gyration potentials (--rfd_extra_args):

#!/bin/bash
# CHANGE THIS - this is the path where your git clone of this repo is
WF_PATH="/some/path/to/nf-binder-design"

mkdir -p results/logs
DATESTAMP=$(date +%Y%m%d_%H%M%S)

# Ensure our tmp directory is in a location with enough space
export TMPDIR=$(realpath ./tmp)
export NXF_TEMP=$TMPDIR
mkdir -p $TMPDIR

# CHANGE THIS to a path in scratch or scratch2 to act as the cache directory for apptainer
# Containers will be automatically downloaded to this path.
# You can add it to ~/.bashrc if you prefer
export NXF_APPTAINER_CACHEDIR=/some/path/to/scratch2/apptainer_cache
export NXF_APPTAINER_TMPDIR=$TMPDIR

# CHANGE the --slurm_account to match the project ID you wish to run SLURM jobs under
nextflow \
-c ${WF_PATH}/conf/platforms/m3.config run \
${WF_PATH}/main.nf  \
--slurm_account=ab12 \
--input_pdb 'input/target_cropped.pdb' \
--design_name my-binder \
--outdir results \
--contigs "[B346-521/B601-696/B786-856/0 70-130]" \
--hotspot_res "B472,B476,B484,B488" \
--rfd_n_designs=1000 \
--rfd_batch_size=5 \
--rfd_filters="rg<20" \
--pmpnn_seqs_per_struct=2 \
--pmpnn_relax_cycles=1 \
--pmpnn_weigths="/models/HyperMPNN/retrained_models/v48_020_epoch300_hyper.pt" \
--rfd_model_path="/models/rfdiffusion/Complex_beta_ckpt.pt" \
--rfd_extra_args='potentials.guiding_potentials=[\"type:binder_ROG,weight:7,min_dist:10\"] potentials.guide_decay="quadratic"' \
-resume \
-with-report results/logs/report_${DATESTAMP}.html \
-with-trace results/logs/trace_${DATESTAMP}.txt

Summarize af2_initial_guess scores

This happens by default when the pipeline successfully completes, however you may want to run it mid-run to monitor how things are going:

OUTDIR=results

bin/af2_combine_scores.py -o $OUTDIR/combined_scores.tsv -p $OUTDIR/af2_results

Partial diffusion on binder designs

NOTE: It seems with output from previous designs that the binder is always named chain A, and your other chains are named B, C, etc - irrespective of the chain ID in the original target PDB file. Residue numbering is 1 to N, sequential irrespective of gaps in the chain, rather than original target chain numbering.

OUTDIR=results
mkdir -p $OUTDIR/logs

# Generate 10 partial designs for each binder, in batches of 5
# Note the 'single quotes' around the '*.pdb' glob pattern !
nextflow run partial.nf  \
    --input_pdb 'my_designs/*.pdb' \
    --rfd_n_partial_per_binder=10 \
    --rfd_batch_size=5 \
    --hotspot_res "A473,A995,A411,A421" \
    --rfd_partial_T=2,5,10,20 \
    -with-report $OUTDIR/logs/report_$(date +%Y%m%d_%H%M%S).html \
    -with-trace $OUTDIR/logs/trace_$(date +%Y%m%d_%H%M%S).txt \
    -profile local

Design filter plugin system

The pipeline supports a plugin system for calculating and filtering on custom metrics for designs. This is currently controlled by the --rfd_filters parameter.

Filters are simple Python scripts located in the bin/filters.d/ directory. Any *.py file in this directory will be automatically discovered.

Currently, only a radius of gyration (rg) filter is implemented in bin/filters.d/rg.py, which can be used as a template for creating new filters.

Plugin API

A filter plugin must implement two functions:

register_metrics() -> list[str] This function should return a list of the metric names (as strings) that the plugin can calculate. For example: return ["rg", "my_custom_score"].
calculate_metrics(pdb_files: list[str], binder_chains: list[str]) -> pd.DataFrame This function takes a list of PDB file paths and a list of binder chain IDs. It should perform its calculations and return a pandas.DataFrame with the following structure:
- The index of the DataFrame must be the design ID (i.e., the PDB filename without the .pdb extension).
- The columns must correspond to the metric names returned by register_metrics().

The main bin/filter_designs.py script will call these plugins as needed based on the filter expressions provided to the pipeline (e.g., --rfd_filters "rg<20").

Binder design with BindCraft

The bindcraft.nf helps run BindCraft trajectories in parallel across multiple GPUs. This is particularly well suited for running BindCraft on an HPC cluster, or a workstation with multiple GPUs.

Unlike the default BindCraft configuration which runs for an indeterminate amount of time until a number of accepted designs are found, this pipeline will run a fixed number of trajectories --bindcraft_n_traj and stop.

Example:

DATESTAMP=$(date +%Y%m%d_%H%M%S)

nextflow run bindcraft.nf  \
  --input_pdb 'input/PDL1.pdb' \
  --outdir results \
  --target_chains "A" \
  --hotspot_res "A56,A125" \
  --hotspot_subsample 0.5 \
  --binder_length_range "55-120" \
  --bindcraft_n_traj 2 \
  --bindcraft_batch_size 1 \
  --bindcraft_advanced_settings_preset "default_4stage_multimer" \
  --bindcraft_filters_preset "default_filters" \
  -profile local \
  -resume \
  -with-report results/logs/report_${DATESTAMP}.html \
  -with-trace results/logs/trace_${DATESTAMP}.txt

--bindcraft_advanced_settings_preset and --bindcraft_filters_preset are are those available in the BindCraft settings_advanced and settings_filters directories (without the .json extension).

--hotspot_subsample randomly takes this random proportion of the hotspot residues for each design, allowing the impact of hotspot selection to be explored in a single run.

If you have multiple GPUs per compute node, you can specify them with the --gpu_devices flag, eg --gpu_devices=0,1.

Results are saved to the --outdir directory, in the bindcraft subdirectory, with CSV outputs from each batch combined into single tables, eg bindcraft/final_design_stats.csv, eg:

── bindcraft
│   ├── accepted
│   │   └── results
│   │       └── Accepted
│   │           ├── bindcraft_design_1_l57_s942028_mpnn6_model1.pdb
│   │           └── bindcraft_design_1_l57_s942028_mpnn8_model2.pdb
│   ├── batches
│   │   ├── 0
│   │   │   └── results
│   │   │       ├── failure_csv.csv
│   │   │       ├── final_design_stats.csv
│   │   │       ├── mpnn_design_stats.csv
│   │   │       ├── Trajectory
│   │   │       └── trajectory_stats.csv
│   │   └── 1
│   │       └── results
│   │           ├── Accepted
│   │           ├── failure_csv.csv
│   │           ├── final_design_stats.csv
│   │           ├── MPNN
│   │           ├── mpnn_design_stats.csv
│   │           ├── Rejected
│   │           ├── Trajectory
│   │           └── trajectory_stats.csv
│   ├── bindcraft_report.html
│   ├── failure_csv.csv
│   ├── final_design_stats.csv
│   ├── mpnn_design_stats.csv
│   └── trajectory_stats.csv
└── logs
    ├── report_20250725_084959.html
    ├── trace_20250725_084959.txt

A report summarizing the results is generated in bindcraft_report.html.

Boltz pulldown

An AlphaPulldown-like protocol, using Boltz. This is essentially running multimer predictions for all sequences in the target set (--targets targets.fasta) against all sequences in the binder set (--binders binders.fasta).

By default, no multiple sequence alignments are used. If you set the --create_target_msa=true and --create_binder_msa=true flags, target and binder multiple sequence alignments will be generated, respectively.

For de novo designed binders a typical pattern might be to use --create_target_msa=true to allow the target to use an MSA to improve prediction accuracy, but not attempt to find homologs for the de novo designed partner.

Specifying --use_msa_server will use the remote ColabFold mmseq2 server to generate multiple sequence alignments.

If generating the MSAs locally, indexed mmseqs2 databases can be downloaded and generated with the ColabFold setup_databases.sh script. You'll need to specify the --uniref30 and --colabfold_envdb paths to point to these databases.

You can add additional args to the Boltz command line via ext.args for the BOLTZ process in nextflow.config, eg:

process {
    withName: BOLTZ {
        accelerator = 1
        time = 2.hours
        memory = '8g'
        cpus = 2
        
        // if using CPU only
        ext.args = "--accelerator cpu"
    }
}

By default, boltz_pulldown.nf will output to results/boltz_pulldown, which includes the Boltz outputs with scores and predicted structures, and a summary table boltz_pulldown.tsv. It also outputs boltz_pulldown_report.html with summary statistics and plots of the binder/target ipTM scores.

License

MIT

Note that some software dependencies of the pipeline are under less permissive licenses - in particular, RFDiffusion and BindCraft use Rosetta/PyRosetta which is only free for Non-Commercial use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nf-binder-design

Setup

Binder design with RFDiffusion

Summarize af2_initial_guess scores

Partial diffusion on binder designs

Design filter plugin system

Plugin API

Binder design with BindCraft

Boltz pulldown

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
assets		assets
bin		bin
conf/platforms		conf/platforms
examples		examples
models		models
modules		modules
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
bindcraft.nf		bindcraft.nf
boltz_pulldown.nf		boltz_pulldown.nf
combine_scores.sh		combine_scores.sh
main.nf		main.nf
nextflow.config		nextflow.config
partial.nf		partial.nf

License

tlitfin-unsw/nf-binder-design

Folders and files

Latest commit

History

Repository files navigation

nf-binder-design

Setup

Binder design with RFDiffusion

Summarize af2_initial_guess scores

Partial diffusion on binder designs

Design filter plugin system

Plugin API

Binder design with BindCraft

Boltz pulldown

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages