Skip to content

Improve error reporting/handling in batch submission mode #61

@mdmosby

Description

@mdmosby

When canary calls submit, any error output from submission is lost and not reported to console or included in the .canary directory for triage. The user can manually execute the generated script to see the output, but it would be better to capture this output into a file and/or include the output in the reported exception.

In addition to the error reporting, the exception condition (at least for scheduler=slurm) seems to hang and requires KeyboardInterrupt, even when all test batches fail/error. The expected behavior is that canary run should exit/report as usual with tests reported as 'not run' once the final batch completes. Since they all failed on submit, the expectation is that this should be immediate.

Steps to reproduce

  1. Target a machine that requires specification of a valid option to a scheduler (e.g., account)
  2. Do not specify the required option, causing all batches to fail at launch

Example command

canary run -b scheduler=slurm ./examples Note the exclusion of -b option="--account=xyz".

Version information

$ pip list
Package     Version
----------- -------
canary-wm   25.10.7
hpc-connect 25.10.7
Jinja2      3.1.6
MarkupSafe  3.0.3
pip         24.0
pluggy      1.6.0
psutil      7.1.1
PyYAML      6.0.3
schema      0.7.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions