Skip to content

Problem: A user cannot cancel or abort a workflow once initiated #1290

@fiver-watson

Description

@fiver-watson

The problem

Early Enduro development mostly involved testing with very small SIPs for rapid turnaround. Now that we are testing with larger packages and approaching production implementation for clients, there are some noticeable usability deficits that are becoming more apparent.

One of these is the ability to cancel an ingest (or other type of) workflow once it is launched, prior to it finishing and storing the resulting output (e.g. storing an AIP, etc). There may be cases where a large SIP ingest is started, and then an operator discovers an issue prior to the workflow completing (for example: for SFA, a Vecteur scanning employee finding a missing page in a large digitization SIP).

Given that:

  • Processing large SIPs of many GBs (or even TBs) can take a long time, and
  • If an institution is using tape storage (especially something like WORM storage, which is write-once)

... it could be both frustrating and even potentially expensive to not be able to cancel an in-progress workflow prior to completion.

To Reproduce

  1. Try ingesting one of the larger (e.g. the 6GB or bigger) test samples from this folder in Drive
  2. Think about needing to cancel it before an AIP is created and stored

Resulting error

There is no way to cancel / abort / abandon an in-progress workflow, though an operator may have reasons for doing so.

Expected behavior

An option is provided to cancel or otherwise end an ongoing workflow prior to its completion. When triggered, a process should be launched to clean up any artifacts left behind (e.g. in processing directories; etc)

Additional context

This is in part related to other reliability and recoverability work, such as the transfer integrity work done for STE/NLBS, or even some of the Legacy Enduro features.

Legacy Enduro had UI options to manually launch a Retry when a workflow halted due to a system error - the idea being that a developer or system administrator might fix an upstream issue (such as a disk full problem or temporary network interruption), and the user can then retry the workflow without needing to fully abandon it. In case multiple retries produced no better outcome, an "Abandon" option was also provided.

Some similar Retry functionality was added to the STE Enduro to support transfer integrity across air-gapped zones.

SDPS Enduro does have an unused status of "Abandon" in the enums (or at least it did until recently if it no longer does), but I am not sure that "Abandon" (after failed retries) really is needed as a separate concept from "Cancel". However: ideally, work on an option in Enduro to Cancel a workflow might lead to further related future work, such as:

  • manual retries for errors
  • the ability to clear errored ingests from the user interface
  • etc

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    Status

    🛠 Refining

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions