Skip to content

Model Registry

Cameron Shand edited this page Aug 27, 2025 · 6 revisions

To unify model definitions, simplify providing information to the Napari plugin (and other future interfaces) that allows for automatic UI construction, and enable community contribution we have a model registry repository that defines the models available within AI OnDemand (AIoD).

Contributing

This section covers how to contribute to the registry, whether it's a completely new model, new model version, or a model for a completely new task (e.g. new organelle). Depending on what is contributed, additional work may be needed (e.g. update/add a Python script to the Segment-Flow pipeline). If so, this will be outlined with relevant links in the sections below.

Contribute a New Model

To add a new model, a new manifest needs to be added to the manifest directory in the repo. Note that if this includes a new task (e.g. new thing to segment), you'll also need to add a new task.

The schema section below will provide an outline of the bare minimum needed, but you're encouraged to look at previously-created manifests, and ensure that they locally pass validation with the Pydantic model before opening a pull request. Either way, the validation will be done on a pull request, and will not be merged until it passes.

Note

Adding a new (base) model will also need an accompanying Python script and process in the Segment-Flow pipeline. For further details, see the relevant page.

Contribute a New Model Version

To add a new model version, simply add the version to the appropriate existing manifest, then make a pull request where the updated schema will be validated (though you can test eligibility locally to make sure).

What do I mean by model version? Basically, any model that can also be run using the same Python script that is currently defined in the Nextflow pipeline. Typically, this would be another model within an existing library (e.g. Cellpose's cyto2 and cyto3), or a similar architecture with different weights (e.g. a fine-tuned model). In theory, we could have a very generic PyTorch script that just loads a model and runs a prediction, but often models like SAM, Cellpose etc. have their own libraries and pre- and post-processing steps that means it's best to keep broader, model-specific scripts.

See the schema section below for further guidelines on what is needed to define a model version.

Contribute a Model with a New Task

To contribute a model or model version with a new task, the list of available tasks will need to be updated, as these are used to constrain model schemas and define what is available in the Napari plugin.

In the model schema, TASK_NAMES is a dictionary defines that defines the short-hand name (key) and the display name (value) for a given task. Simple add the new key:pair value, and make a pull request to add this new task.

The Napari UI will automatically update to include this new task, and the model schema will be updated to include this new task as an option for any new models or model versions such that they can pass validation.

Add a New Filepath

As illustrated in this schema, the same model can be defined in multiple locations. This can be useful if a model is not public, but internal to an institution with restricted access, or if your institution has restrictions around incoming traffic on HPC. The model may, therefore, exist as a few copies for each lab/usergroup that has access to it, or could exist on separate workstations. Ideally, this would be handled via user permissions, but the functionality is there if needed.

Simply add to the list, or first turn the location into a list (using square brackets) and then add the new path.

Schema

While schema are not always the most readable for humans, a few perspectives are given between:

  • The Pydantic model used for parsing and validation can be found (here)
  • The generated JSON schema from the Pydantic model (here)
  • Existing schema, all in the manifests directory which should help clarify what is needed!

Overall, models are specified hierarchically, from a base model to a model version to a task-specific version. The following information is required for a schema:

  • A model name (the short_name is used as the name for the Python script and conda environments in the Nextflow pipeline)
  • Model versions
    • For each version, its name and each of the tasks that model is trained for (normally one), and the model location (either a filepath or a URL)
    • Optionally, a path/URL to a config file can also be provided in case any additional parameters are needed that are not defined by users
  • Relevant metadata
    • While a DOI is not required, some basic information about the model is needed, and will be reviewed upon a PR. See the relevant contribution section.

Each model version represents a variant of a base model, where differences may be different input data (e.g. for a different task), different checkpoints/hyperparameters for the same model, or they could even be architectural differences (e.g. varying sizes). Ultimately, as long as the underlying Python script to run the model handles everything needed for that version (if anything extra is needed), then that's enough.

If the version is a big enough departure that a different environment is needed, it may be better to create a new base model, but this is up to the contributor and will be reviewed upon a PR.

Note

Any parameters given at the root input level apply for all model versions. However, a config_path or list of params can be given to specific model-task-versions if they differ from the root.

Clone this wiki locally