-
Notifications
You must be signed in to change notification settings - Fork 714
feat: remove storage account name if present from azure path #4692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: remove storage account name if present from azure path #4692
Conversation
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: endre-seqera <endre.sukosd@seqera.io>
aee2c56
to
fcd676f
Compare
plugins/nf-azure/src/main/nextflow/cloud/azure/file/AzPathFactory.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: endre-seqera <endre.sukosd@seqera.io>
f64509e
to
4704d95
Compare
Provided this may work, i'm bit concerned altering the container aka bucket name under the hood. This could result having nextflow reporting a different object file path compared to the one specified by the user, in the trace files, report, and other provenance record. It may be worth to add full support for it |
I understand your concern, but the bucket (container) name is not altered, only the additional information (storage account name) is removed. So object file path will still be correct, just missing this extra information. But this extra information is redundant in a way, because it is already known to the user, since the user has to configure it with their azure credentials in the
The "convention" used in SeqeraPlatform But no one is expecting azure paths in this format, and for a given, specific nextflow configuration (or for one specific credential) this information (storage account name) is not necessary. |
456c4c3
to
4704d95
Compare
Agreed, if we can at least click the data explorer interface to add a file or dir as input this would make the platform compatible. Extending this to actually support multiple storage accounts would be the next step: #4445 (comment) |
Hi there, this still seems to be an issue. I've had this problem when selecting my data in seqera platform from the data explorer. The path would be parsed as |
5a93547
to
27345a6
Compare
b4b321e
to
069653d
Compare
@endre-seqera @bentsherman @pditommaso Do we need to sit down and work through this to get to resolution next week? Customers are encountering this and it would be good to resolve/merge. Thanks! |
I still skeptic about this approach, especially considering a future goal is to allow the use of multiple blob containers across different accounts, therefore the account name information should not stripped from the URI. I have a tentative implementation retaining the account name in the URI that I'll upload at my earliest convenience. it's also required to make sure both azcopy and Fusion are able to handle this change. @jordeu any clue about the latter ? |
Right now, Fusion gets the storage account from an environment variable only. If necessary, we could find other solutions. But I'd like to challenge the need to make the URI universal with all the keys to identify a data source. There are other similar scenarios where we need more pieces to identify a source, and right now, Nextflow is not flexible enough. Let me explain an example, imagine that we want to use the same pipeline for data that we've in a local MinIO S3 storage, with data that is at AWS S3 or data on LakeFS S3. Currently, this is not possible with Nextflow because we identify the "source" from the URI schema (i.e What if it was possible to define our own "schemes" at the Nextflow config level? Something like:
Then all the paths that start with This will allow other use cases, like overwriting the "source" location. Example of general config:
That with a profile can be overwritten by a local copy of the source "refgenomes":
So we could always use the path I think that this concept of "sources" maps very well to the concept of "data link" that is associated with the Seqera platform. Overall, what I mean is that what we are trying to solve now here for Azure is a more generic problem that we'll face in other use cases that want more flexibility, mixing multiple data sources in a single pipeline. |
Think this is a good idea but beyond the goal of this ticket, that consists on supporting the syntax |
Closes: #4683
Remove storage account name if present from azure path
Tests
Tested using nextflow-publishdir pipeline.
Before:
After: