Skip to content

Sanitize labels in AWS batch #6211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

edmundmiller
Copy link
Member

This should prevent errors such as nf-core/hic#224.

I think this goes with the Nextflow ethos to keep the user from thinking about the nuances of different executors.

But I could also see this not being "expected".

@edmundmiller edmundmiller requested a review from adamrtalbot June 23, 2025 15:04
@edmundmiller edmundmiller self-assigned this Jun 23, 2025
Copy link

netlify bot commented Jun 23, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 4d70df3
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/68596d982095c90008a15dc6

…TaskHandler.

Added methods to sanitize individual labels and a map of labels to comply with AWS constraints. Updated tests to verify sanitization functionality.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Refined expected outputs for labels with brackets, unicode characters, and special characters to ensure compliance with AWS constraints. Adjusted test cases for length truncation and null handling to reflect accurate sanitization behavior.

Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Copy link
Collaborator

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let better Groovy wizards comment on the code, but I think this is a worthwhile feature to make it "just work". From my perspective I want to know:

  • If a user puts an invalid label in, how do they know it was dropped?
  • Are there weird situations where a label is bypassed?

Comment on lines +875 to +878
// Only add non-empty keys and values
if (sanitizedKey && sanitizedValue) {
result.put(sanitizedKey, sanitizedValue)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will silently drop keys and values that can't be sanitised, which is a bit bad in terms of UX.

At a minimum, I would raise a warning here, but I might be tempted to throw an error and tell the user to specify a valid label.


// Replace invalid characters with underscores
// AWS Batch allows: letters, numbers, spaces, and: _ . : / = + - @
String sanitized = input.replaceAll(/[^a-zA-Z0-9\s_.\:\/=+\-@]/, '_')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Untested, but I think you can simplify this to:

Suggested change
String sanitized = input.replaceAll(/[^a-zA-Z0-9\s_.\:\/=+\-@]/, '_')
String sanitized = input.replaceAll(/[^a-zA-Z0-9-_]+/,'_')

you might also be able to fold the length into a single chain:

Suggested change
String sanitized = input.replaceAll(/[^a-zA-Z0-9\s_.\:\/=+\-@]/, '_')
String sanitized = input
.trim()
.take(maxLength)
.replaceAll(/[^a-zA-Z0-9-_]+/, '_')

Copy link
Collaborator

@adamrtalbot adamrtalbot Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course changing the order affects the output - you may want to replaceAll then take based on what you want.

Searching through the codebase it tends to be a ternary with substring:

name.size() > 256 ? name.substring(0,256) : name
}

Comment on lines +900 to +911
// Replace multiple consecutive underscores/spaces with single underscore
sanitized = sanitized.replaceAll(/[_\s]{2,}/, '_')

// Remove leading/trailing underscores and spaces
sanitized = sanitized.replaceAll(/^[_\s]+|[_\s]+$/, '')

// Truncate if too long
if (sanitized.length() > maxLength) {
sanitized = sanitized.substring(0, maxLength)
// Remove trailing underscore/space after truncation
sanitized = sanitized.replaceAll(/[_\s]+$/, '')
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably simplify this a little bit by trimming first then replacing (see chain example above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to simplify or make it readable?

I'm all for clever code, but I want to make it clear what we're doing and trying to overcome here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the same variable a bunch of times can be hard to read as well because it's difficult to tell what the variable is at any one time. I wouldn't stack all of these changes into one chain but I would consider simplifying the code a bit. The best idea is to break it down into discrete chunks of obvious code, although that's not always easy to do.

final key = entry.getKey()
final value = entry.getValue()

if (key != null && value != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value is null, this is skipped. Currently, when the item is "item": null is the aws tag silently dropped?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants