Integration of Automated Machine Learning (AutoML) methods into nnU-Net. Free software: BSD license.
The repository is structured in the following directories:
autonnunet
: The AutoNNU-Net python package, includinganalysis
: Plotting, DeepCAVE utilitiesdatasets
: MSD Dataset handlingevaluation
: Predicition tools for the MSD test setexperiment_planning
: Extensions to the nnU-Net prediction tools for AutoNNU-Nethnas
: Hierarchical NAS search space and integration into AutoNNU-Netinference
: Prediction within AutoNNU-Net+utils
: Collection of various utiltities, e.g., paths
data
: Everything related to (MSD) datasetsoutput
: Everything that is generated by AutoNNU-Net locally, e.g. optimization results, MSD submissionsresults_zipped
: Compressed output, this is stored in the reporunscripts
: Here are the actual scrips to execute experiments etc.submodules
: Git submodules, e.g. hypersweeper, nnU-Net etctests
: Unit tests for AutoNNU-Netpaper
: Plots and tables generated by plotting scripts
Important: This code was only tested for Rocky Linux 9.5 and CUDA 12.4. Other operating systems/GPUs/CUDA versions may not be supported. In order to install AutoNNU-Net, CUDA drivers are highly recommended - otherwise the installation of PyTorch may fail. On HPCs, for example, this means that you have to load the CUDA module before installing the package.
Important: Due to compatibilityi issues with numpy
, DeepCAVE
is not listed as a requirement of AutoNNU-Net
. However, in order to create the plots and tables, you need to install DeepCAVE
. Therefore, we recommend installing DeepCAVE
manually after running the experiments.
- Clone the repository and its submodulues
git clone https://github.com/automl/AutoNNUnet.git autonnunet
cd autonnunet
- Create and activate an Anaconda/Miniconda environment with Python 3.10
conda create -n autonnunet python=3.10
conda activate autonnunet
- Install AutoNNU-Net
make install
Important: The automated installation is great if you want to install all submodules automatically. However, it is also quite sensible to system-specific python and package versions. Therefore, if the installation using make fails, we recommend to install the subpackages manually:
# submodules
cd submodules/batchgenerators && git checkout master && git pull && pip install . && cd ../../
cd submodules/hypersweeper && git checkout dev && git pull && pip install . && cd ../../
cd submodules/MedSAM && git checkout MedSAM2 && git pull && pip install . && cd ../../
cd submodules/neps && git checkout master && git pull && pip install . && cd ../../
cd submodules/nnUNet && git checkout dev && git pull && pip install . && cd ../../
# AutoNNUNet
pip install -e ".[dev]"
For our experiments, we used submitit-slurm
to run code on a SLURM cluster. You can define your custom SLURM cluster configuration in runscripts/configs/cluster
.
We ran all experiments using the gpu
cluster configurations.
If you want to run your experiments locally, please use cluster=local
for every command that uses hydra.
To download a specific dataset, run
python autonnunet/datasets/msd_dataset.py --dataset_name=<dataset>
For example, to download D01 (BrainTumour), run:
python autonnunet/datasets/msd_dataset.py --dataset_name=Dataset001_BrainTumour
To download all datasets, run
./runscripts/download_msd.sh
Important: This has to be executed on the same cluster/compute environment as the target for the training to get the correct nnU-Net configurations, e.g. by appending cluster=gpu
.
python runscripts/convert_and_preprocess_nnunet.py -m "dataset=glob(*)"
python runscripts/train.py -m "dataset=glob(*)" "fold=range(5)"
python runscripts/train.py -m "dataset=glob(*)" "fold=range(5)" "hp_config.encoder_type=ResidualEncoderM"
python runscripts/train.py -m "dataset=glob(*)" "fold=range(5)" "hp_config.encoder_type=ResidualEncoderL"
Important: First, you need to run the training for at least one of the nnU-Net models for a specific dataset as they create the dataset splits before you can run the MedSAM2 fine-tuning.
Important: The pre-processing for MedSAM2 must be executed locally, i.e. cannot be submitted on a SLURM cluster due to compatibility issues between pickle and multiprocessing.
python runscripts/convert_and_preprocess_medsam2.py -m "dataset=glob(*)" "cluster=local"
- Download model checkpoint
cd submodules/MedSAM && mkdir checkpoints && cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
cd ../../../
- Fine-tune MedSAM2
python runscripts/finetune_medsam2.py -m "dataset=glob(*)" "fold=range(5)"
python runscripts/determine_hyperband_budgets.py --b_min=10 --b_max=1000 --eta=3
python runscripts/train.py --config-name=tune_hpo -m "dataset=Dataset001_BrainTumour"
python runscripts/train.py --config-name=tune_hpo_nas -m "dataset=Dataset001_BrainTumour"
Incumbent configurations are stored in runscripts/configs/incumbent
. You can find our incumbent configurations already in this directory.
If you want to re-create them after running the experiments, you need to run:
python runscripts/extract_incumbents.py --approach=hpo
Using these configs, you can than run the training of the incumbent configurations using the command:
python runscripts/train.py -m "dataset=<dataset_name>" "+incumbent=Dataset001_BrainTumour_<approach>" "fold=range(5)" "pipeline.remove_validation_files=False"
Please note that you could also use the model saved during the optimization. In our experiments, we did not store model checkpoints in the respective run directories to reduce the memory consumption.
To run nnU-Net with the incumbent configuration for the HPO approach on D01, run
python runscripts/train.py -m "dataset=Dataset001_BrainTumour" "+incumbent=Dataset001_BrainTumour_hpo" "fold=range(5)"
For cross-evaluation of incumbent configurations, we select the 9/10 datasets where HPO+NAS achieved an improvement. To train all datasets with the incumbent configuration of another dataset, run
./runscripts/train_cross_eval.sh <dataset_name>
python runscripts/run_inference.py --approach=<approach>
Or directly submit it to SLURM:
sbatch runscripts/run_inference.sh <approach>
Creates the MSD submission in output/msd_submissions
To generate all plots and tables in the paper and store them in output/paper
, run
python runscripts/plot.py
This package was created with Cookiecutter_ and the audreyr/cookiecutter-pypackage
_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _audreyr/cookiecutter-pypackage
: https://github.com/audreyr/cookiecutter-pypackage
Sometimes during optimization, jobs fail while loading cached torch inductor files. To fix this, run
rm -rf ~/.cache/torch
rm -rf ~/.cache/triton/
rm -rf ~/.nv/ComputeCache
This package was created with Cookiecutter_ and the audreyr/cookiecutter-pypackage
_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _audreyr/cookiecutter-pypackage
: https://github.com/audreyr/cookiecutter-pypackage