Skip to content

Conversation

@johanneskoester
Copy link
Contributor

@johanneskoester johanneskoester commented Mar 28, 2025

Summary by CodeRabbit

  • New Features
    • Introduced an environment configuration for managing dependencies, including curated package sources.
    • Added a new workflow step that processes candidate calls using the savana tool, featuring enhanced multi-threaded performance and dedicated logging.
    • Introduced a targeted testing job for the savana workflow, enhancing testing capabilities.
    • Added new processing rules for handling germline SNVs and candidate calls.
    • Expanded configuration options for variant calling by integrating the savana tool across multiple configuration files.
    • Added a new variable for managing germline events within the workflow.
    • Enhanced CNV calling functionality through the savana tool in the main configuration.
    • Introduced new sections in configuration files to activate the savana tool and specify germline events for variant calling.

@coderabbitai
Copy link

coderabbitai bot commented Mar 28, 2025

Walkthrough

This update introduces a new Conda environment configuration file, savana.yaml, which specifies channels and a package dependency for the savana tool. Additionally, a new rule named savana is added to the Snakemake workflow for processing candidate calls, utilizing the new environment. The rule defines its inputs, outputs, logging, and execution parameters. A new job, "test savana," is also added to the GitHub Actions workflow to enhance testing capabilities for the savana functionality.

Changes

File(s) Change Summary
workflow/envs/…/savana.yaml New YAML configuration file specifying channels (conda-forge, bioconda, nodefaults) and dependencies for savana and other packages.
workflow/rules/…/candidate_calling.smk New savana rule added with defined inputs (ref, ref_idx, aln, index), output BCF file, conda environment reference to savana.yaml, logging, and execution settings (16 threads).
.github/workflows/…/main.yml New job test savana added to run with argument only_savana, enhancing error reporting with disk usage on error.
workflow/…/Snakefile New only_savana rule added to handle .savana.bcf files for each sample, specifying input file paths.
.test/config-…/config.yaml New section under calling for savana with properties activate: true and germline_events: present added in multiple test configuration files.
workflow/rules/…/common.smk New variable germline_events initialized for use in the workflow.
workflow/rules/…/germline_snvs.smk Two new rules added: gather_germline_calls and control_fdr_germline_snvs for processing germline SNVs.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Workflow Scheduler
    participant Rule as "savana Rule"
    participant Env as Conda Environment (savana.yaml)
    participant Tool as savana Tool
    participant Log as Logging System
    participant Result as Results Directory

    Scheduler->>Rule: Trigger savana rule
    Rule->>Env: Load environment configuration
    Rule->>Rule: Validate inputs (ref, ref_idx, aln, index)
    Rule->>Tool: Execute savana command with parameters
    Tool-->>Log: Redirect errors to log file
    Tool-->>Result: Write output BCF file
Loading

Poem

Hoppin' along in the code delight,
I’ve added savana to guide our flight,
Channels flow like streams of cheer,
In every rule, I bring magic near,
A bunny leap in our workflow light! 🐇🥕


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
workflow/envs/savana.yaml (1)

1-6: LGTM: Environment configuration looks good

The Conda environment configuration correctly specifies the required channels and the savana package with version 1.3.2.

Consider adding a newline at the end of the file to address the YAML linting warning.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 6-6: no new line character at the end of file

(new-line-at-end-of-file)

workflow/rules/candidate_calling.smk (1)

26-43: Savana rule implementation looks good, with some minor suggestions

The rule is well structured with appropriate inputs, outputs, and resource allocation. The shadow directive is a good practice to isolate the working directory.

Consider the following improvements:

  1. The shell command relies on *_sv_breakpoints.vcf pattern matching exactly one file:
- "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
- " mv *_sv_breakpoints.vcf {output}) 2> {log}"
+ "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
+ " find . -name '*_sv_breakpoints.vcf' -exec mv {{}} {output} \\;) 2> {log}"
  1. The output file has a .bcf extension but the command moves a .vcf file. Consider explicitly converting to BCF format:
- "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
- " mv *_sv_breakpoints.vcf {output}) 2> {log}"
+ "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
+ " bcftools view -Ob *_sv_breakpoints.vcf > {output}) 2> {log}"
  1. Consider dynamic thread allocation similar to other rules instead of hardcoding 8 threads.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c784f2 and daa3b42.

📒 Files selected for processing (2)
  • workflow/envs/savana.yaml (1 hunks)
  • workflow/rules/candidate_calling.smk (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml

[error] 6-6: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test sra download
  • GitHub Check: test primers
  • GitHub Check: test no candidate filtering
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test target regions
  • GitHub Check: test testcase generation
  • GitHub Check: test local input

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
.test/config-simple/config.yaml (1)

44-45: New savana entry added to calling configuration.
The addition of the savana section with activate: true expands the list of variant calling tools. Please verify that this configuration is consistent with the corresponding workflow rules (and the associated Conda environment in workflow/envs/savana.yaml) and that no additional parameters (such as germline_events) are needed in this file.

.test/config_primers/config.yaml (1)

46-47: Savana integration in primers configuration confirmed.
The new savana entry with activate: true is correctly introduced alongside the existing tools. Consider whether you need to include additional parameters (e.g., germline_events) for certain analyses, as seen in other configuration files, to ensure overall consistency.

workflow/rules/germline_snvs.smk (2)

1-12: Good implementation of the germline calls gathering rule.

The rule effectively uses bcftools concat with the -a parameter to allow overlapping records when gathering calls from Freebayes. The use of a pipe output is efficient as it avoids writing large intermediate files to disk.

Consider adding a resources directive to specify memory and CPU requirements, as bcftools concat can be resource-intensive for large datasets:

rule gather_germline_calls:
    input:
        calls="results/calls/{group}.freebayes.{scatteritem}.bcf",
        idx="results/calls/{group}.freebayes.{scatteritem}.bcf.csi",
    output:
        pipe("results/germline-snvs/{group}.germline_snv_candidates.bcf"),
    log:
        "logs/germline-snvs/gather-calls/{group}.log",
    params:
        extra="-a",
+   resources:
+       mem_mb=4000,
+       runtime=60
    wrapper:
        "v2.3.2/bio/bcftools/concat"

1-27: Consider adding integration with savana for SV calling as mentioned in PR title.

The PR title mentions adding savana for SV calling, but this file only contains rules for germline SNV processing. The rules look good on their own, but might be missing the integration with savana.

Would you like me to help draft the additional rule for savana integration? This would complement the existing germline SNV processing pipeline.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7e80e9 and 1ad170f.

📒 Files selected for processing (13)
  • .github/workflows/main.yml (2 hunks)
  • .test/config-chm-eval/config.yaml (1 hunks)
  • .test/config-giab/config.yaml (1 hunks)
  • .test/config-no-candidate-filtering/config.yaml (1 hunks)
  • .test/config-simple/config.yaml (1 hunks)
  • .test/config-sra/config.yaml (1 hunks)
  • .test/config-target-regions/config.yaml (1 hunks)
  • .test/config_primers/config.yaml (1 hunks)
  • config/config.yaml (1 hunks)
  • workflow/Snakefile (2 hunks)
  • workflow/rules/candidate_calling.smk (1 hunks)
  • workflow/rules/common.smk (1 hunks)
  • workflow/rules/germline_snvs.smk (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • .github/workflows/main.yml
  • workflow/Snakefile
  • workflow/rules/candidate_calling.smk
🔇 Additional comments (8)
workflow/rules/common.smk (1)

47-47: New variable looks good for savana integration.

The addition of germline_events variable properly handles the case when it's not defined in the configuration, defaulting to an empty list. This will be used by the new savana rule for conditional processing of germline SNVs.

.test/config-target-regions/config.yaml (1)

45-46: LGTM: Savana activation in config.

The addition of savana to the calling section with activate: true is consistent with how other variant callers are configured.

.test/config-sra/config.yaml (1)

42-43: LGTM: Savana activation in config.

The addition of savana to the calling section with activate: true is consistent with how other variant callers are configured.

.test/config-no-candidate-filtering/config.yaml (1)

45-46: LGTM: Savana activation in config.

The addition of savana to the calling section with activate: true is consistent with how other variant callers are configured.

.test/config-giab/config.yaml (1)

88-91: savana configuration with germline events added.
The new section defines savana with activate: true and adds a germline_events list containing present. Please confirm that the identifier "present" accurately represents the intended event to be used downstream in the workflow.

config/config.yaml (1)

115-120: Enhanced savana settings for CNV calling.
The savana entry now not only activates the tool but also specifies a germline_events list (with the value - germline), which is required for triggering CNV calling. Make sure that the event name "germline" matches the expected identifier in your workflow and that all related components are updated accordingly.

.test/config-chm-eval/config.yaml (1)

44-47: Savana tool activated with germline events in CHM evaluation configuration.
This section correctly activates savana and assigns a germline_events list containing present. Please double-check that the value "present" is used consistently across the configuration and workflow rules, so that the intended candidate calls are correctly processed.

workflow/rules/germline_snvs.smk (1)

1-27:

✅ Verification successful

Verify the germline_events variable is properly defined.

I noticed the rule uses germline_events as a parameter, but there's no reference to where this variable is defined.

Please ensure the germline_events variable is properly defined and available in this context, likely in a common configuration file like workflow/rules/common.smk or in a config.yaml file.


🏁 Script executed:

#!/bin/bash
# Check if germline_events is defined in common.smk or similar files
grep -r "germline_events" workflow/rules/

Length of output: 452


Verified: The germline_events variable is properly defined

I checked and confirmed that germline_events is defined in workflow/rules/common.smk via a lookup in the configuration:

  • In workflow/rules/common.smk:
    germline_events = lookup("calling/savana/germline_events", within=config, default=[])

This definition makes it available for use in the control_fdr_germline_snvs rule, so no changes are required.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
workflow/envs/savana.yaml (1)

13-13: Add a newline at the end-of-file.
A newline character is missing at the end of the file. Adding one improves compatibility with various tools and adheres to best practice standards.

🧰 Tools
🪛 YAMLlint (1.35.1)

[warning] 13-13: wrong indentation: expected 6 but found 4

(indentation)


[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88956f4 and 0b0d111.

📒 Files selected for processing (2)
  • workflow/envs/savana.yaml (1 hunks)
  • workflow/rules/candidate_calling.smk (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/rules/candidate_calling.smk
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml

[warning] 13-13: wrong indentation: expected 6 but found 4

(indentation)


[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test sra download
  • GitHub Check: test primers
  • GitHub Check: test no candidate filtering
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test target regions
  • GitHub Check: test testcase generation
  • GitHub Check: test local input

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
workflow/envs/savana.yaml (2)

5-12: Ensure Consistency in Dependency Version Specifiers.

While the dependencies are clearly declared, note that bcftools is specified as "bcftools =1.21" (using a single equals sign with spaces) whereas the other packages use == (e.g., "pandas==2.0.0"). For clarity and consistency, consider standardizing the version specifiers across all dependencies.


13-14: Pip Dependency Block: Indentation Verified & Missing Newline at EOF.

The pip dependency block appears properly indented per earlier review suggestions. However, YAMLlint highlights that there is no newline character at the end of the file. Please add a newline after line 14 to eliminate this linting error.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0d111 and 53294d0.

📒 Files selected for processing (1)
  • workflow/envs/savana.yaml (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test sra download
  • GitHub Check: test primers
  • GitHub Check: test no candidate filtering
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test target regions
  • GitHub Check: test testcase generation
  • GitHub Check: test local input
🔇 Additional comments (1)
workflow/envs/savana.yaml (1)

1-4: Channels Section is Configured Correctly.

The specified channels (conda-forge, bioconda, and nodefaults) are clear and well-organized.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a9d4ef and b535811.

📒 Files selected for processing (1)
  • workflow/envs/savana.yaml (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test sra download
  • GitHub Check: test primers
  • GitHub Check: test no candidate filtering
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test target regions
  • GitHub Check: test testcase generation
  • GitHub Check: test local input
🔇 Additional comments (1)
workflow/envs/savana.yaml (1)

13-14: Pip dependency indentation is correct.
The pip dependency block now correctly indents the nested list with six spaces, which aligns with the previous refactoring advice.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

- pandas==2.0.0
- matplotlib==3.7.1
- pip:
- git+https://github.com/johanneskoester/savana.git@fix/empty-calls No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add a newline at the end of the file.
YAML lint reports an error due to the missing newline at the end of the file. Please add a newline to ensure compliance with the linting guidelines.

@@
-      - git+https://github.com/johanneskoester/savana.git@fix/empty-calls
\ No newline at end of file
+      - git+https://github.com/johanneskoester/savana.git@fix/empty-calls
+ 
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- git+https://github.com/johanneskoester/savana.git@fix/empty-calls
- git+https://github.com/johanneskoester/savana.git@fix/empty-calls
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants