-
Notifications
You must be signed in to change notification settings - Fork 4
Description
When a user wishes to add images to an existing pipeline run, they modify the config to include the new inputs and relaunch the run. A check is performed to ensure that:
- New inputs have been added to the config, and
- No other settings have been changed.
Both of these conditions must be true for a pipeline run to be re-run in "add mode". The pipeline checks if the inputs have changed by reading the previous config file config_prev.yml and comparing it with the updated config.yml file. Both config files are parsed, validated, and all glob expressions are resolved.
Suppose that the config inputs are a simple glob expression, e.g.
inputs:
image:
glob: /data/vast-survey/VAST/release/EPOCH*/COMBINED/STOKESI_IMAGES/*.fitsIf new files that match this expression are added to the filesystem, the pipeline will fail to detect that the inputs have changed. It will read both config_prev.yml and config.yml, which would contain the same glob expression in this case, and compare them. Since the globs are resolved when the config file is read, both config files will end up with the same list of inputs even though new files matching the glob were added since the run was executed.
The problem is that the config diff check only parses the previous config file and doesn't look at which images were actually used.
A potential solution would be to add a comparison of the number of resolved inputs in config.yml with the number of images stored in the Run object (i.e. Run.n_images) to the config diff check. If the number of inputs is greater than the number of images in the run object, then the run should be re-run in add mode. This won't work if images were removed, but that isn't allowed for "add mode" anyway.