-
Notifications
You must be signed in to change notification settings - Fork 707
Sanitize labels in AWS batch #6211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for nextflow-docs-staging canceled.
|
…TaskHandler. Added methods to sanitize individual labels and a map of labels to comply with AWS constraints. Updated tests to verify sanitization functionality. Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
Refined expected outputs for labels with brackets, unicode characters, and special characters to ensure compliance with AWS constraints. Adjusted test cases for length truncation and null handling to reflect accurate sanitization behavior. Signed-off-by: Edmund Miller <edmund.miller@seqera.io>
32540d2
to
4d70df3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will let better Groovy wizards comment on the code, but I think this is a worthwhile feature to make it "just work". From my perspective I want to know:
- If a user puts an invalid label in, how do they know it was dropped?
- Are there weird situations where a label is bypassed?
// Only add non-empty keys and values | ||
if (sanitizedKey && sanitizedValue) { | ||
result.put(sanitizedKey, sanitizedValue) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will silently drop keys and values that can't be sanitised, which is a bit bad in terms of UX.
At a minimum, I would raise a warning here, but I might be tempted to throw an error and tell the user to specify a valid label.
|
||
// Replace invalid characters with underscores | ||
// AWS Batch allows: letters, numbers, spaces, and: _ . : / = + - @ | ||
String sanitized = input.replaceAll(/[^a-zA-Z0-9\s_.\:\/=+\-@]/, '_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Untested, but I think you can simplify this to:
String sanitized = input.replaceAll(/[^a-zA-Z0-9\s_.\:\/=+\-@]/, '_') | |
String sanitized = input.replaceAll(/[^a-zA-Z0-9-_]+/,'_') |
you might also be able to fold the length into a single chain:
String sanitized = input.replaceAll(/[^a-zA-Z0-9\s_.\:\/=+\-@]/, '_') | |
String sanitized = input | |
.trim() | |
.take(maxLength) | |
.replaceAll(/[^a-zA-Z0-9-_]+/, '_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course changing the order affects the output - you may want to replaceAll then take based on what you want.
Searching through the codebase it tends to be a ternary with substring:
nextflow/modules/nextflow/src/main/groovy/nextflow/executor/AbstractGridExecutor.groovy
Lines 182 to 183 in 6231d76
name.size() > 256 ? name.substring(0,256) : name | |
} |
// Replace multiple consecutive underscores/spaces with single underscore | ||
sanitized = sanitized.replaceAll(/[_\s]{2,}/, '_') | ||
|
||
// Remove leading/trailing underscores and spaces | ||
sanitized = sanitized.replaceAll(/^[_\s]+|[_\s]+$/, '') | ||
|
||
// Truncate if too long | ||
if (sanitized.length() > maxLength) { | ||
sanitized = sanitized.substring(0, maxLength) | ||
// Remove trailing underscore/space after truncation | ||
sanitized = sanitized.replaceAll(/[_\s]+$/, '') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably simplify this a little bit by trimming first then replacing (see chain example above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to simplify or make it readable?
I'm all for clever code, but I want to make it clear what we're doing and trying to overcome here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing the same variable a bunch of times can be hard to read as well because it's difficult to tell what the variable is at any one time. I wouldn't stack all of these changes into one chain but I would consider simplifying the code a bit. The best idea is to break it down into discrete chunks of obvious code, although that's not always easy to do.
final key = entry.getKey() | ||
final value = entry.getValue() | ||
|
||
if (key != null && value != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the value is null
, this is skipped. Currently, when the item is "item": null
is the aws tag silently dropped?
This should prevent errors such as nf-core/hic#224.
I think this goes with the Nextflow ethos to keep the user from thinking about the nuances of different executors.
But I could also see this not being "expected".