Skip to content

Caching

sea-shunned edited this page Oct 21, 2024 · 4 revisions

Introduction

AI OnDemand takes advantage of shared directories to cache/store models, results, and conda environments to minimise storage and running times.

Note that we refer to these centralised directories as "caches", but they are not expected to be temporary storage!

Cache Location

By default, the central cache directory is $HOME/.nextflow/aiod. Both models and results are stored here. This base/central directory can be changed in the Napari plugin, and will be remembered between sessions.

Caching Models

If the location of a model is specified by a URL, then this model will first need to be downloaded, unless it is present in the cache in which case this step is skipped. If the location is specified by a path, then the path is simply used (as accessibility to this path has already been checked from within the Napari plugin), copying that file into the cache.

Caching Results

Segmentation Masks

The created segmentation masks are stored in a model-version-specific directory under $HOME/.nextflow/aiod. While the user can then export these (through the Naparu plugin) to another location, the masks in this directory are considered "cached". After a pipeline run has been triggered (from the Napari plugin), this cache is checked to see if there are existing results (specific to that input file, model version, and any input parameters), in which case they are loaded and that file is skipped.

Caching Conda Environments

The Nextflow Pipeline constructs a conda environment for each model and step in the process, defined by a relevant YAML file (located here).

By default, Nextflow creates the environment in the pipeline work directory. This is both time-consuming and unnecessary in the case of AIoD. Instead, you can give Nextflow a cache directory for conda environments as a central location. We do this by adding conda.cacheDir to the Nextflow profiles.

If a central, readable location on HPC is specified, then the required conda environments will only be built once across the institution/group, minimising time and footprint.

Clone this wiki locally