Skip to content

publishDir compatibility with Azure Blob Storage #4683

@endre-seqera

Description

@endre-seqera

Bug report

Expected behavior and actual behavior

Expected behavior is that publishDir directive should work with Azure links, using different formats.
Actual behavior is that publishDir fails for:

  • Azure links starting with https://
  • or Azure links with paths containing the storage account az://<storage-account>.<bucket>

Steps to reproduce the problem

  • set up Nextflow with Azure Cloud (basic set up)
  • run the nf-canary pipeline
  • pass in differently formatted paths for params.outdir

Working example:

nextflow run https://github.com/seqeralabs/nf-canary -r main -w az://nf-scratch/work --outdir "az://test-public" # succeeds

Failing example1 - storage account in the path:

nextflow run https://github.com/seqeralabs/nf-canary -r main -w az://nf-scratch/work --outdir "az://nfazurestore.test-public" # fail
ERROR ~ Error executing process > 'NF_CANARY:TEST_PUBLISH_FOLDER'

Caused by:
  /nfazurestore.test-public: Unable to determine if root directory exists

Failing example 2 - https path used:

nextflow run https://github.com/seqeralabs/nf-canary -r main -w az://nf-scratch/work --outdir "https://nfazurestore.blob.core.windows.net/test-public" # fail
ERROR ~ Error executing process > 'NF_CANARY:TEST_PUBLISH_FOLDER'

Caused by:
  Create directory not supported by HTTPS file system provider

Root cause of failures is:

  • first in FileHelper.groovy paths get transformed into canonicalPath (for example into /<storage-acccount>.<bucket>)
  • then Files.createDirectories(this.path) fails with the given error message

Environment

  • Nextflow version: 23.12.0-edge build 5901
  • Java version: openjdk 21.0.1 2023-10-17 LTS
  • Operating system: macOS Sonoma - 14.2.1 (23C71)
  • Bash version: zsh 5.9 (x86_64-apple-darwin23.0)

Additional context

Reasoning for path with storage account name included support:
Azure bucket/container names are not unique, they are only unique in a storage account. So to be able to identify them correctly, in Seqera Platform the following path format is used az://<storage-acccount>.<bucket>. Because Nextflow has knowledge of the storage account name - it has to be set up in the config - this part could be easily removed from the path, fixing the issue.

Reasoning for path with https support:
Azure docs about referencing blobs suggest using an URL like this: https://<storage-acccount>.blob.core.windows.net/<bucket>.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions