Skip to content

Unable to resume sync after job cancelled by time constraints #182

@aaronkanzer

Description

@aaronkanzer

@kabilar @yarikoptic @jwodder @satra

Note: I've yet to remove/edit files in the Engaging location yet -- looking to discuss first.

Within the MIT Engaging cluster, jobs can only run a maximum of 12 hours. The logs above show that a download was in progress when the job was shut down while processing s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/

2025-03-26T20:19:01.000372681-04:00  INFO process_item{url=s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.15}: s3invsync::syncer: Finished processing object
2025-03-26T20:19:01.000508644-04:00  INFO process_item{url=s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.22}: s3invsync::syncer: Processing object
2025-03-26T20:19:01.000641888-04:00  INFO process_item{url=s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.22}: s3invsync::syncer: Object is latest version of key
2025-03-26T20:19:01.000916407-04:00  INFO process_item{url=s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.22}: s3invsync::syncer: Backup path does not exist; will download path=/orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.22
2025-03-26T20:19:01.001352984-04:00 DEBUG process_item{url=s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.22}:download_item:download_object{url=s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.22}: s3invsync::s3: Downloading object to disk
slurmstepd: error: *** JOB 64590752 ON node1802 CANCELLED AT 2025-03-26T20:19:01 DUE TO TIME LIMIT ***

From the next run of the script (and when trying again), the following error below occurs on the same asset, in which it seems logic is not in place to handle the edge case. The script then exits.

2025-03-27T08:48:21.940691055-04:00 ERROR s3invsync::syncer: Error occurred error=failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.17

Caused by:
    No entry for "0.42.1.17" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
2025-03-27T08:48:21.958987484-04:00 ERROR s3invsync::syncer: Error occurred error=failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.18

Caused by:
    No entry for "0.42.1.18" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
2025-03-27T08:48:21.97595937-04:00 ERROR s3invsync::syncer: Error occurred error=failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.19

Caused by:
    No entry for "0.42.1.19" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
2025-03-27T08:48:22.000279032-04:00 ERROR s3invsync::syncer: Error occurred error=failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.2

Caused by:
    No entry for "0.42.1.2" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
2025-03-27T08:48:22.019417743-04:00 ERROR s3invsync::syncer: Error occurred error=failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.20

Caused by:
    No entry for "0.42.1.20" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
2025-03-27T08:48:22.038714645-04:00 ERROR s3invsync::syncer: Error occurred error=failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.21

Caused by:
    No entry for "0.42.1.21" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
Error: 7 ERRORS:
---
failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.16

Caused by:
    No entry for "0.42.1.16" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
---
failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.17

Caused by:
    No entry for "0.42.1.17" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
---
failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.18

Caused by:
    No entry for "0.42.1.18" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
---
failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.19

Caused by:
    No entry for "0.42.1.19" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
---
failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.2

Caused by:
    No entry for "0.42.1.2" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
---
failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.20

Caused by:
    No entry for "0.42.1.20" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json
---
failed to get local metadata for s3://linc-brain-mit-prod-us-east-2/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/0.42.1.21

Caused by:
    No entry for "0.42.1.21" in /orcd/data/linc/001/s3lincbrain/zarr/2f82859f-dae5-45df-8b12-5fd7b38497ff/0/.s3invsync.versions.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions