Skip to content

New input files are not detected when run config uses globs #645

@marxide

Description

@marxide

When a user wishes to add images to an existing pipeline run, they modify the config to include the new inputs and relaunch the run. A check is performed to ensure that:

  1. New inputs have been added to the config, and
  2. No other settings have been changed.

Both of these conditions must be true for a pipeline run to be re-run in "add mode". The pipeline checks if the inputs have changed by reading the previous config file config_prev.yml and comparing it with the updated config.yml file. Both config files are parsed, validated, and all glob expressions are resolved.

Suppose that the config inputs are a simple glob expression, e.g.

inputs:
  image:
    glob: /data/vast-survey/VAST/release/EPOCH*/COMBINED/STOKESI_IMAGES/*.fits

If new files that match this expression are added to the filesystem, the pipeline will fail to detect that the inputs have changed. It will read both config_prev.yml and config.yml, which would contain the same glob expression in this case, and compare them. Since the globs are resolved when the config file is read, both config files will end up with the same list of inputs even though new files matching the glob were added since the run was executed.

The problem is that the config diff check only parses the previous config file and doesn't look at which images were actually used.

A potential solution would be to add a comparison of the number of resolved inputs in config.yml with the number of images stored in the Run object (i.e. Run.n_images) to the config diff check. If the number of inputs is greater than the number of images in the run object, then the run should be re-run in add mode. This won't work if images were removed, but that isn't allowed for "add mode" anyway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions