-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor/niagara #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/niagara #17
Conversation
""" WalkthroughA new Nextflow workflow for processing NIfTI and CSV files was introduced, with logic to pair files based on labels and handle missing CSVs. The main workflow in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Nextflow
participant MAGeTBrain
participant collectAndCombineVolumes
participant collectVolumes
participant combineVolumes
User->>Nextflow: Run workflow
Nextflow->>MAGeTBrain: Execute main workflow
MAGeTBrain->>collectAndCombineVolumes: Pass majorityVoteOutput
collectAndCombineVolumes->>collectVolumes: Pair NIfTI & CSV files, handle missing CSVs
collectVolumes->>combineVolumes: Collect volumes
combineVolumes-->>collectAndCombineVolumes: Combine and finalize volumes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
nextflow.config (1)
5-6
: Consider using production directory names instead of test directories.The directory parameters
input_test2
andoutput_test3
suggest this configuration is for testing. Ensure these are updated to production values before merging to avoid confusion.collect_and_combine_volumes_niagara.nf (2)
8-8
: Consider making the regex pattern more specific.The regex pattern
/_label_([\w]+)\.nii.gz/
uses\w
which matches letters, digits, and underscores. Verify this matches your expected label format and consider making it more specific if needed (e.g.,/_label_([a-zA-Z0-9]+)\.nii.gz/
if underscores aren't expected in labels).
4-4
: Add validation for the output directory path.Consider adding validation to ensure the
${params.outputDir}/labels/majorityvote/
directory exists before attempting to process files.+ // Validate output directory exists + def outputPath = file("${params.outputDir}/labels/majorityvote/") + if (!outputPath.exists() || !outputPath.isDirectory()) { + error "Output directory does not exist: ${outputPath}" + } + niftiFiles = Channel.fromPath("${params.outputDir}/labels/majorityvote/*.nii.gz")magetbrain.nf (1)
277-279
: Clarify the purpose of emitting null.The workflow emits
null
which seems unusual. If the workflow doesn't need to emit any output, consider removing the emit block entirely or documenting why null is being emitted.- emit: - null + // No emit block needed if no output is required
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
collect_and_combine_volumes_niagara.nf
(1 hunks)magetbrain.nf
(3 hunks)nextflow.config
(2 hunks)
🔇 Additional comments (5)
nextflow.config (2)
23-46
: Verify that memory removal from process configurations is intentional.Memory specifications were removed from all processes in the niagara profile. Ensure this is intentional and that SLURM will handle memory allocation appropriately, or if default memory limits should be restored.
32-32
:✅ Verification successful
Verify process name consistency across all workflow files.
The process name was changed from
registerNonLinear
toregisterNonlinear
. Ensure this naming is consistent in all Nextflow scripts that reference this process.
🏁 Script executed:
#!/bin/bash # Description: Check for any remaining references to the old process name # Expected: No occurrences of 'registerNonLinear' in workflow files echo "Searching for old process name 'registerNonLinear':" rg -i "registerNonLinear" --type=nf echo -e "\nSearching for new process name 'registerNonlinear':" rg -i "registerNonlinear" --type=nfLength of output: 369
🏁 Script executed:
#!/bin/bash # Description: Correctly search for process name references in Nextflow scripts (.nf files) echo "Searching for old process name 'registerNonLinear' in .nf files:" rg -i "registerNonLinear" --glob '*.nf' || echo "No matches found for 'registerNonLinear'" echo -e "\nSearching for new process name 'registerNonlinear' in .nf files:" rg -i "registerNonlinear" --glob '*.nf' || echo "No matches found for 'registerNonlinear'"Length of output: 1055
🏁 Script executed:
#!/bin/bash # Description: Case-sensitive search for exact process names in Nextflow scripts echo "Searching for old process name 'registerNonLinear' (case-sensitive):" rg --fixed-strings --glob '*.nf' "registerNonLinear" || echo "No occurrences of 'registerNonLinear' found" echo -e "\nSearching for new process name 'registerNonlinear' (case-sensitive):" rg --fixed-strings --glob '*.nf' "registerNonlinear" || echo "No occurrences of 'registerNonlinear' found"Length of output: 860
Process name consistency confirmed
No occurrences of the oldregisterNonLinear
remain; all.nf
workflow scripts useregisterNonlinear
consistently.collect_and_combine_volumes_niagara.nf (1)
16-16
:✅ Verification successful
Verify downstream handling of "NO_FILE" placeholder.
Ensure that the
collectVolumes
andcombineVolumes
functions properly handle the "NO_FILE" placeholder for missing CSV files, or consider using a more explicit approach likenull
or an empty file.
🏁 Script executed:
#!/bin/bash # Description: Check how "NO_FILE" is handled in the imported functions # Expected: Proper handling of this placeholder value echo "Searching for 'NO_FILE' handling in collect_and_combine_volumes.nf:" rg -A 5 -B 5 "NO_FILE" collect_and_combine_volumes.nf echo -e "\nSearching for file existence checks or validation:" rg -A 3 -B 3 "exists\(\)|isFile\(\)" collect_and_combine_volumes.nfLength of output: 836
🏁 Script executed:
#!/bin/bash # Extract the full definitions of the two processes to see how they handle an empty labelCsvArg echo "=== process collectVolumes ===" rg -R "process collectVolumes" -A50 -B2 collect_and_combine_volumes_niagara.nf echo -e "\n=== process combineVolumes ===" rg -R "process combineVolumes" -A50 -B2 collect_and_combine_volumes_niagara.nfLength of output: 1119
🏁 Script executed:
#!/bin/bash # Correctly extract the definitions of the Nextflow processes to inspect handling of an empty labelCsvArg echo "=== process collectVolumes ===" rg "^process\s+collectVolumes" -A100 collect_and_combine_volumes_niagara.nf || true echo -e "\n=== process combineVolumes ===" rg "^process\s+combineVolumes" -A100 collect_and_combine_volumes_niagara.nf || trueLength of output: 317
🏁 Script executed:
#!/bin/bash # Search for "NO_FILE" occurrences in Nextflow scripts echo "=== NO_FILE occurrences ===" rg -n "NO_FILE" --glob "*.nf" # Locate process definitions in collect_and_combine_volumes_niagara.nf echo -e "\n=== process collectVolumes definition ===" rg -n "^process\s+collectVolumes" collect_and_combine_volumes_niagara.nf -A20 echo -e "\n=== process combineVolumes definition ===" rg -n "^process\s+combineVolumes" collect_and_combine_volumes_niagara.nf -A20Length of output: 830
🏁 Script executed:
#!/bin/bash # Show the top 100 lines of the module to inspect both functions echo "=== collect_and_combine_volumes.nf (lines 1-100) ===" sed -n '1,100p' collect_and_combine_volumes.nfLength of output: 1162
NO_FILE placeholder is already handled correctly
ThecollectVolumes
process mapslabelCsv.name != 'NO_FILE'
to an empty string, so the downstream script never sees the"NO_FILE"
literal. ThecombineVolumes
process only concatenates the generated TSVs and does not consume the placeholder, so no additional handling is required.– collectVolumes:
• Usesdef labelCsvArg = labelCsv.name != 'NO_FILE' ? "${labelCsv}" : ""
• Passes an empty string instead of the placeholder tocollect_volumes_nifti.sh
– combineVolumes:
• Operates solely on the produced TSV files; it never receives"NO_FILE"
No changes needed in these processes. Please just confirm that
collect_volumes_nifti.sh
correctly handles being called without a label argument.magetbrain.nf (2)
1-1
: LGTM - DSL directive fix.Good fix removing the extra space in the DSL directive.
309-309
:✅ Verification successful
Verify the workflow integration.
Ensure that calling
collectAndCombineVolumes
withmajorityVoteOutput
works correctly, sincemajorityVoteOutput
is the result of theMAGeTBrain
workflow which should contain the output frommajorityVote.out
.
🏁 Script executed:
#!/bin/bash # Description: Check the structure of majorityVote output and workflow integration # Expected: Proper channel structure for workflow chaining echo "Checking majorityVote process output structure:" rg -A 10 -B 5 "process majorityVote" magetbrain.nf echo -e "\nChecking MAGeTBrain workflow emit:" rg -A 5 -B 5 "emit.*majorityVoteOutput" magetbrain.nfLength of output: 633
I’ll pull in the surrounding context and check where
majorityVoteOutput
andcollectAndCombineVolumes
are defined to confirm the channel wiring.
🏁 Script executed:
#!/bin/bash # Show lines 280–350 with numbers echo "Lines 280–350 in magetbrain.nf:" nl -ba magetbrain.nf | sed -n '280,350p' # Find where collectAndCombineVolumes is called/defined echo -e "\ncollectAndCombineVolumes occurrences:" rg -n -C3 "collectAndCombineVolumes" magetbrain.nf # Find any declaration or emit of majorityVoteOutput echo -e "\nmajorityVoteOutput references:" rg -n -C3 "majorityVoteOutput" magetbrain.nfLength of output: 1558
Workflow Integration Verified
The
majorityVoteOutput
channel is correctly emitted by the MAGeTBrain workflowemit: majorityVoteOutput = majorityVote.outand consumed by collectAndCombineVolumes via its
take:
directiveworkflow collectAndCombineVolumes { take: majorityVoteOutput … }
The top-level invocation
majorityVoteOutput = MAGeTBrain(atlases, labels, templates, subjects) collectAndCombineVolumes(majorityVoteOutput)wires the channel properly. No changes required.
workflow collectAndCombineVolumes{ | ||
take: majorityVoteOutput | ||
|
||
main: | ||
majorityVoteOutput | ||
.map { a_file -> | ||
def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/ | ||
if (matcher.find()) { | ||
def label = matcher.group(1) | ||
def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv") | ||
|
||
if (csvFile.exists()) { | ||
return [csvFile, a_file] | ||
} else { | ||
return [file("NO_FILE"), a_file] | ||
|
||
} | ||
} else { | ||
// if the file match does not exists null will be returned | ||
// nextflow automatically handles nulls | ||
return null | ||
} | ||
} | ||
.set { filesToProcess } | ||
// collect the volumes and combine results | ||
volumes = collectVolumes(filesToProcess) | ||
// after all files process they will be collected | ||
combineVolumes(volumes.collect()) | ||
|
||
emit: | ||
null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider extracting shared logic to reduce code duplication.
The collectAndCombineVolumes
workflow contains nearly identical logic to the workflow in collect_and_combine_volumes_niagara.nf
. Consider extracting the file mapping logic into a reusable function or module to follow the DRY principle.
Example approach:
+// Extract the mapping logic to a reusable function
+def mapNiftiToCsv(niftiFiles) {
+ return niftiFiles.map { a_file ->
+ def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/
+ if (matcher.find()) {
+ def label = matcher.group(1)
+ def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv")
+
+ if (csvFile.exists()) {
+ return [csvFile, a_file]
+ } else {
+ return [file("NO_FILE"), a_file]
+ }
+ } else {
+ return null
+ }
+ }
+}
+
workflow collectAndCombineVolumes{
take: majorityVoteOutput
main:
- majorityVoteOutput
- .map { a_file ->
- def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/
- if (matcher.find()) {
- def label = matcher.group(1)
- def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv")
-
- if (csvFile.exists()) {
- return [csvFile, a_file]
- } else {
- return [file("NO_FILE"), a_file]
- }
- } else {
- return null
- }
- }
- .set { filesToProcess }
+ filesToProcess = mapNiftiToCsv(majorityVoteOutput)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
workflow collectAndCombineVolumes{ | |
take: majorityVoteOutput | |
main: | |
majorityVoteOutput | |
.map { a_file -> | |
def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/ | |
if (matcher.find()) { | |
def label = matcher.group(1) | |
def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv") | |
if (csvFile.exists()) { | |
return [csvFile, a_file] | |
} else { | |
return [file("NO_FILE"), a_file] | |
} | |
} else { | |
// if the file match does not exists null will be returned | |
// nextflow automatically handles nulls | |
return null | |
} | |
} | |
.set { filesToProcess } | |
// collect the volumes and combine results | |
volumes = collectVolumes(filesToProcess) | |
// after all files process they will be collected | |
combineVolumes(volumes.collect()) | |
emit: | |
null | |
} | |
// Extract the mapping logic to a reusable function | |
def mapNiftiToCsv(niftiFiles) { | |
return niftiFiles.map { a_file -> | |
def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/ | |
if (matcher.find()) { | |
def label = matcher.group(1) | |
def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv") | |
if (csvFile.exists()) { | |
return [csvFile, a_file] | |
} else { | |
return [file("NO_FILE"), a_file] | |
} | |
} else { | |
return null | |
} | |
} | |
} | |
workflow collectAndCombineVolumes{ | |
take: majorityVoteOutput | |
main: | |
filesToProcess = mapNiftiToCsv(majorityVoteOutput) | |
// collect the volumes and combine results | |
volumes = collectVolumes(filesToProcess) | |
// after all files process they will be collected | |
combineVolumes(volumes.collect()) | |
emit: | |
null | |
} |
🤖 Prompt for AI Agents
In magetbrain.nf around lines 248 to 279, the file mapping logic inside the
collectAndCombineVolumes workflow duplicates similar code found in
collect_and_combine_volumes_niagara.nf. To fix this, extract the file mapping
logic that processes majorityVoteOutput into a separate reusable function or
module. Then call this function from both workflows to avoid duplication and
adhere to the DRY principle.
Summary by CodeRabbit
New Features
Bug Fixes
Chores