Refactor/niagara #17

CMonnin · 2025-06-03T23:38:37Z

Summary by CodeRabbit

New Features
- Introduced a new workflow for collecting and combining volume data from NIfTI and CSV files, improving data processing automation.
Bug Fixes
- Corrected a typo in the Nextflow DSL enable directive to ensure proper workflow execution.
Chores
- Updated SLURM executor configuration by standardizing resource requests, adding explicit cluster options and time limits, and removing memory specifications for improved performance and consistency.

coderabbitai · 2025-06-03T23:38:43Z

"""

Walkthrough

A new Nextflow workflow for processing NIfTI and CSV files was introduced, with logic to pair files based on labels and handle missing CSVs. The main workflow in magetbrain.nf was refactored to use this new workflow. Resource allocations and SLURM executor settings in nextflow.config were updated and standardized.

Changes

File(s)	Change Summary
collect_and_combine_volumes_niagara.nf	Added a Nextflow workflow to pair NIfTI and CSV files by label, handling missing CSVs, and process them using imported functions.
magetbrain.nf	Fixed DSL typo; refactored to add and use `collectAndCombineVolumes` workflow after `MAGeTBrain`; removed inline mapping/processing logic.
nextflow.config	Updated SLURM executor config: removed module load, standardized and increased resource allocations with explicit cluster options and time limits; renamed a process.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Nextflow
    participant MAGeTBrain
    participant collectAndCombineVolumes
    participant collectVolumes
    participant combineVolumes

    User->>Nextflow: Run workflow
    Nextflow->>MAGeTBrain: Execute main workflow
    MAGeTBrain->>collectAndCombineVolumes: Pass majorityVoteOutput
    collectAndCombineVolumes->>collectVolumes: Pair NIfTI & CSV files, handle missing CSVs
    collectVolumes->>combineVolumes: Collect volumes
    combineVolumes-->>collectAndCombineVolumes: Combine and finalize volumes

Poem

🐇
In fields of files, I hop with glee,
Pairing labels, as neat as can be.
If a CSV’s lost, I don’t despair—
“NO_FILE” I bring, with gentle care.
With resources set and workflows new,
My Nextflow garden blooms anew!
🌱
"""

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5568705 and 0b5a442.

📒 Files selected for processing (2)

magetbrain.nf (3 hunks)
nextflow.config (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

nextflow.config
magetbrain.nf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

nextflow.config (1)

5-6: Consider using production directory names instead of test directories.

The directory parameters input_test2 and output_test3 suggest this configuration is for testing. Ensure these are updated to production values before merging to avoid confusion.
collect_and_combine_volumes_niagara.nf (2)
8-8: Consider making the regex pattern more specific.

The regex pattern /_label_([\w]+)\.nii.gz/ uses \w which matches letters, digits, and underscores. Verify this matches your expected label format and consider making it more specific if needed (e.g., /_label_([a-zA-Z0-9]+)\.nii.gz/ if underscores aren't expected in labels).

4-4: Add validation for the output directory path.

Consider adding validation to ensure the ${params.outputDir}/labels/majorityvote/ directory exists before attempting to process files.
+    // Validate output directory exists
+    def outputPath = file("${params.outputDir}/labels/majorityvote/")
+    if (!outputPath.exists() || !outputPath.isDirectory()) {
+        error "Output directory does not exist: ${outputPath}"
+    }
+    
     niftiFiles = Channel.fromPath("${params.outputDir}/labels/majorityvote/*.nii.gz")
magetbrain.nf (1)
277-279: Clarify the purpose of emitting null.

The workflow emits null which seems unusual. If the workflow doesn't need to emit any output, consider removing the emit block entirely or documenting why null is being emitted.
-    emit: 
-        null
+    // No emit block needed if no output is required

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a53a33 and 5568705.

📒 Files selected for processing (3)

collect_and_combine_volumes_niagara.nf (1 hunks)
magetbrain.nf (3 hunks)
nextflow.config (2 hunks)

🔇 Additional comments (5)

nextflow.config (2)

23-46: Verify that memory removal from process configurations is intentional.

Memory specifications were removed from all processes in the niagara profile. Ensure this is intentional and that SLURM will handle memory allocation appropriately, or if default memory limits should be restored.

32-32:
✅ Verification successful

Verify process name consistency across all workflow files.

The process name was changed from registerNonLinear to registerNonlinear. Ensure this naming is consistent in all Nextflow scripts that reference this process.

🏁 Script executed:
#!/bin/bash
# Description: Check for any remaining references to the old process name
# Expected: No occurrences of 'registerNonLinear' in workflow files

echo "Searching for old process name 'registerNonLinear':"
rg -i "registerNonLinear" --type=nf

echo -e "\nSearching for new process name 'registerNonlinear':"
rg -i "registerNonlinear" --type=nf
Length of output: 369

🏁 Script executed:
#!/bin/bash
# Description: Correctly search for process name references in Nextflow scripts (.nf files)

echo "Searching for old process name 'registerNonLinear' in .nf files:"
rg -i "registerNonLinear" --glob '*.nf' || echo "No matches found for 'registerNonLinear'"

echo -e "\nSearching for new process name 'registerNonlinear' in .nf files:"
rg -i "registerNonlinear" --glob '*.nf' || echo "No matches found for 'registerNonlinear'"
Length of output: 1055

🏁 Script executed:
#!/bin/bash
# Description: Case-sensitive search for exact process names in Nextflow scripts

echo "Searching for old process name 'registerNonLinear' (case-sensitive):"
rg --fixed-strings --glob '*.nf' "registerNonLinear" || echo "No occurrences of 'registerNonLinear' found"

echo -e "\nSearching for new process name 'registerNonlinear' (case-sensitive):"
rg --fixed-strings --glob '*.nf' "registerNonlinear" || echo "No occurrences of 'registerNonlinear' found"
Length of output: 860
Process name consistency confirmed
No occurrences of the old registerNonLinear remain; all .nf workflow scripts use registerNonlinear consistently.
collect_and_combine_volumes_niagara.nf (1)

16-16:
✅ Verification successful

Verify downstream handling of "NO_FILE" placeholder.

Ensure that the collectVolumes and combineVolumes functions properly handle the "NO_FILE" placeholder for missing CSV files, or consider using a more explicit approach like null or an empty file.

🏁 Script executed:
#!/bin/bash
# Description: Check how "NO_FILE" is handled in the imported functions
# Expected: Proper handling of this placeholder value

echo "Searching for 'NO_FILE' handling in collect_and_combine_volumes.nf:"
rg -A 5 -B 5 "NO_FILE" collect_and_combine_volumes.nf

echo -e "\nSearching for file existence checks or validation:"
rg -A 3 -B 3 "exists|isFile" collect_and_combine_volumes.nf
Length of output: 836

🏁 Script executed:
#!/bin/bash
# Extract the full definitions of the two processes to see how they handle an empty labelCsvArg
echo "=== process collectVolumes ==="
rg -R "process collectVolumes" -A50 -B2 collect_and_combine_volumes_niagara.nf

echo -e "\n=== process combineVolumes ==="
rg -R "process combineVolumes" -A50 -B2 collect_and_combine_volumes_niagara.nf
Length of output: 1119

🏁 Script executed:
#!/bin/bash
# Correctly extract the definitions of the Nextflow processes to inspect handling of an empty labelCsvArg

echo "=== process collectVolumes ==="
rg "^process\s+collectVolumes" -A100 collect_and_combine_volumes_niagara.nf || true

echo -e "\n=== process combineVolumes ==="
rg "^process\s+combineVolumes" -A100 collect_and_combine_volumes_niagara.nf || true
Length of output: 317

🏁 Script executed:
#!/bin/bash
# Search for "NO_FILE" occurrences in Nextflow scripts
echo "=== NO_FILE occurrences ==="
rg -n "NO_FILE" --glob "*.nf"

# Locate process definitions in collect_and_combine_volumes_niagara.nf
echo -e "\n=== process collectVolumes definition ==="
rg -n "^process\s+collectVolumes" collect_and_combine_volumes_niagara.nf -A20

echo -e "\n=== process combineVolumes definition ==="
rg -n "^process\s+combineVolumes" collect_and_combine_volumes_niagara.nf -A20
Length of output: 830

🏁 Script executed:
#!/bin/bash
# Show the top 100 lines of the module to inspect both functions
echo "=== collect_and_combine_volumes.nf (lines 1-100) ==="
sed -n '1,100p' collect_and_combine_volumes.nf
Length of output: 1162
NO_FILE placeholder is already handled correctly
The collectVolumes process maps labelCsv.name != 'NO_FILE' to an empty string, so the downstream script never sees the "NO_FILE" literal. The combineVolumes process only concatenates the generated TSVs and does not consume the placeholder, so no additional handling is required.

– collectVolumes:
• Uses def labelCsvArg = labelCsv.name != 'NO_FILE' ? "${labelCsv}" : ""
• Passes an empty string instead of the placeholder to collect_volumes_nifti.sh

– combineVolumes:
• Operates solely on the produced TSV files; it never receives "NO_FILE"

No changes needed in these processes. Please just confirm that collect_volumes_nifti.sh correctly handles being called without a label argument.
magetbrain.nf (2)

1-1: LGTM - DSL directive fix.

Good fix removing the extra space in the DSL directive.

309-309:
✅ Verification successful

Verify the workflow integration.

Ensure that calling collectAndCombineVolumes with majorityVoteOutput works correctly, since majorityVoteOutput is the result of the MAGeTBrain workflow which should contain the output from majorityVote.out.

🏁 Script executed:
#!/bin/bash
# Description: Check the structure of majorityVote output and workflow integration
# Expected: Proper channel structure for workflow chaining

echo "Checking majorityVote process output structure:"
rg -A 10 -B 5 "process majorityVote" magetbrain.nf

echo -e "\nChecking MAGeTBrain workflow emit:"
rg -A 5 -B 5 "emit.*majorityVoteOutput" magetbrain.nf
Length of output: 633

I’ll pull in the surrounding context and check where majorityVoteOutput and collectAndCombineVolumes are defined to confirm the channel wiring.

🏁 Script executed:
#!/bin/bash
# Show lines 280–350 with numbers
echo "Lines 280–350 in magetbrain.nf:"
nl -ba magetbrain.nf | sed -n '280,350p'

# Find where collectAndCombineVolumes is called/defined
echo -e "\ncollectAndCombineVolumes occurrences:"
rg -n -C3 "collectAndCombineVolumes" magetbrain.nf

# Find any declaration or emit of majorityVoteOutput
echo -e "\nmajorityVoteOutput references:"
rg -n -C3 "majorityVoteOutput" magetbrain.nf
Length of output: 1558
Workflow Integration Verified

The majorityVoteOutput channel is correctly emitted by the MAGeTBrain workflow
emit:
    majorityVoteOutput = majorityVote.out
and consumed by collectAndCombineVolumes via its take: directive
workflow collectAndCombineVolumes {
  take: majorityVoteOutput
  …
}
The top-level invocation
majorityVoteOutput = MAGeTBrain(atlases, labels, templates, subjects)
collectAndCombineVolumes(majorityVoteOutput)
wires the channel properly. No changes required.

coderabbitai · 2025-06-03T23:45:15Z

magetbrain.nf

+workflow collectAndCombineVolumes{
+  take: majorityVoteOutput
+
+  main:
+    majorityVoteOutput
+        .map { a_file ->
+            def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/
+            if (matcher.find()) {
+                def label = matcher.group(1)
+                def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv")
+
+                if (csvFile.exists()) {
+                    return [csvFile, a_file]  
+                } else {
+                    return [file("NO_FILE"), a_file]  
+
+                }
+            } else {
+                // if the file match does not exists null will be returned
+                // nextflow automatically handles nulls
+                return null  
+            }
+        }
+        .set { filesToProcess }
+    // collect the volumes and combine results
+    volumes = collectVolumes(filesToProcess)
+    // after all files process they will be collected
+    combineVolumes(volumes.collect())
+
+    emit: 
+        null
+    }


🛠️ Refactor suggestion

Consider extracting shared logic to reduce code duplication.

The collectAndCombineVolumes workflow contains nearly identical logic to the workflow in collect_and_combine_volumes_niagara.nf. Consider extracting the file mapping logic into a reusable function or module to follow the DRY principle.

Example approach:

+// Extract the mapping logic to a reusable function +def mapNiftiToCsv(niftiFiles) { + return niftiFiles.map { a_file -> + def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/ + if (matcher.find()) { + def label = matcher.group(1) + def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv") + + if (csvFile.exists()) { + return [csvFile, a_file] + } else { + return [file("NO_FILE"), a_file] + } + } else { + return null + } + } +} + workflow collectAndCombineVolumes{ take: majorityVoteOutput main: - majorityVoteOutput - .map { a_file -> - def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/ - if (matcher.find()) { - def label = matcher.group(1) - def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv") - - if (csvFile.exists()) { - return [csvFile, a_file] - } else { - return [file("NO_FILE"), a_file] - } - } else { - return null - } - } - .set { filesToProcess } + filesToProcess = mapNiftiToCsv(majorityVoteOutput)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

workflow collectAndCombineVolumes{

take: majorityVoteOutput

main:

majorityVoteOutput

.map { a_file ->

def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/

if (matcher.find()) {

def label = matcher.group(1)

def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv")

if (csvFile.exists()) {

return [csvFile, a_file]

} else {

return [file("NO_FILE"), a_file]

}

} else {

// if the file match does not exists null will be returned

// nextflow automatically handles nulls

return null

}

}

.set { filesToProcess }

// collect the volumes and combine results

volumes = collectVolumes(filesToProcess)

// after all files process they will be collected

combineVolumes(volumes.collect())

emit:

null

}

// Extract the mapping logic to a reusable function

def mapNiftiToCsv(niftiFiles) {

return niftiFiles.map { a_file ->

def matcher = a_file.name =~ /_label_([\w]+)\.nii.gz/

if (matcher.find()) {

def label = matcher.group(1)

def csvFile = file("${params.inputDir}/atlases/volume_labels_${label}.csv")

if (csvFile.exists()) {

return [csvFile, a_file]

} else {

return [file("NO_FILE"), a_file]

}

} else {

return null

}

}

}

workflow collectAndCombineVolumes{

take: majorityVoteOutput

main:

filesToProcess = mapNiftiToCsv(majorityVoteOutput)

// collect the volumes and combine results

volumes = collectVolumes(filesToProcess)

// after all files process they will be collected

combineVolumes(volumes.collect())

emit:

null

}

🤖 Prompt for AI Agents

In magetbrain.nf around lines 248 to 279, the file mapping logic inside the collectAndCombineVolumes workflow duplicates similar code found in collect_and_combine_volumes_niagara.nf. To fix this, extract the file mapping logic that processes majorityVoteOutput into a separate reusable function or module. Then call this function from both workflows to avoid duplication and adhere to the DRY principle.

Cian Monnin and others added 5 commits May 29, 2025 12:34

adjusting profile for niagara

aed1b7c

refactor: separate workflow for combine and collect

7b66045

chore: cleanups

016aead

bug: workaround for combine volumes on niagara

7d31eac

fix: removed process req from nextflow.config, only works with workflows

5568705

coderabbitai bot reviewed Jun 3, 2025

View reviewed changes

chore: coderabbit nitpicks

0b5a442

CMonnin merged commit b69cab0 into master Jun 3, 2025
2 checks passed

coderabbitai bot mentioned this pull request Jun 3, 2025

Chore/niagara docs #18

Closed

CMonnin deleted the refactor/niagara branch July 28, 2025 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor/niagara #17

Refactor/niagara #17

Uh oh!

CMonnin commented Jun 3, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jun 3, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jun 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Refactor/niagara #17

Refactor/niagara #17

Uh oh!

Conversation

CMonnin commented Jun 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CMonnin commented Jun 3, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)