feat: add savana for SV calling #373

johanneskoester · 2025-03-28T12:53:27Z

Summary by CodeRabbit

New Features
- Introduced an environment configuration for managing dependencies, including curated package sources.
- Added a new workflow step that processes candidate calls using the savana tool, featuring enhanced multi-threaded performance and dedicated logging.
- Introduced a targeted testing job for the savana workflow, enhancing testing capabilities.
- Added new processing rules for handling germline SNVs and candidate calls.
- Expanded configuration options for variant calling by integrating the savana tool across multiple configuration files.
- Added a new variable for managing germline events within the workflow.
- Enhanced CNV calling functionality through the savana tool in the main configuration.
- Introduced new sections in configuration files to activate the savana tool and specify germline events for variant calling.

coderabbitai · 2025-03-28T12:53:51Z

Walkthrough

This update introduces a new Conda environment configuration file, savana.yaml, which specifies channels and a package dependency for the savana tool. Additionally, a new rule named savana is added to the Snakemake workflow for processing candidate calls, utilizing the new environment. The rule defines its inputs, outputs, logging, and execution parameters. A new job, "test savana," is also added to the GitHub Actions workflow to enhance testing capabilities for the savana functionality.

Changes

File(s)	Change Summary
workflow/envs/…/savana.yaml	New YAML configuration file specifying channels (`conda-forge`, `bioconda`, `nodefaults`) and dependencies for `savana` and other packages.
workflow/rules/…/candidate_calling.smk	New `savana` rule added with defined inputs (ref, ref_idx, aln, index), output BCF file, conda environment reference to `savana.yaml`, logging, and execution settings (16 threads).
.github/workflows/…/main.yml	New job `test savana` added to run with argument `only_savana`, enhancing error reporting with disk usage on error.
workflow/…/Snakefile	New `only_savana` rule added to handle `.savana.bcf` files for each sample, specifying input file paths.
.test/config-…/config.yaml	New section under `calling` for `savana` with properties `activate: true` and `germline_events: present` added in multiple test configuration files.
workflow/rules/…/common.smk	New variable `germline_events` initialized for use in the workflow.
workflow/rules/…/germline_snvs.smk	Two new rules added: `gather_germline_calls` and `control_fdr_germline_snvs` for processing germline SNVs.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Workflow Scheduler
    participant Rule as "savana Rule"
    participant Env as Conda Environment (savana.yaml)
    participant Tool as savana Tool
    participant Log as Logging System
    participant Result as Results Directory

    Scheduler->>Rule: Trigger savana rule
    Rule->>Env: Load environment configuration
    Rule->>Rule: Validate inputs (ref, ref_idx, aln, index)
    Rule->>Tool: Execute savana command with parameters
    Tool-->>Log: Redirect errors to log file
    Tool-->>Result: Write output BCF file

Poem

Hoppin' along in the code delight,
I’ve added savana to guide our flight,
Channels flow like streams of cheer,
In every rule, I bring magic near,
A bunny leap in our workflow light! 🐇🥕

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

workflow/envs/savana.yaml (1)

1-6: LGTM: Environment configuration looks good

The Conda environment configuration correctly specifies the required channels and the savana package with version 1.3.2.

Consider adding a newline at the end of the file to address the YAML linting warning.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 6-6: no new line character at the end of file

(new-line-at-end-of-file)
workflow/rules/candidate_calling.smk (1)
26-43: Savana rule implementation looks good, with some minor suggestions

The rule is well structured with appropriate inputs, outputs, and resource allocation. The shadow directive is a good practice to isolate the working directory.

Consider the following improvements:

The shell command relies on *_sv_breakpoints.vcf pattern matching exactly one file:
- "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
- " mv *_sv_breakpoints.vcf {output}) 2> {log}"
+ "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
+ " find . -name '*_sv_breakpoints.vcf' -exec mv {{}} {output} \\;) 2> {log}"
The output file has a .bcf extension but the command moves a .vcf file. Consider explicitly converting to BCF format:
- "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
- " mv *_sv_breakpoints.vcf {output}) 2> {log}"
+ "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&"
+ " bcftools view -Ob *_sv_breakpoints.vcf > {output}) 2> {log}"
Consider dynamic thread allocation similar to other rules instead of hardcoding 8 threads.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c784f2 and daa3b42.

📒 Files selected for processing (2)

workflow/envs/savana.yaml (1 hunks)
workflow/rules/candidate_calling.smk (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.35.1)

workflow/envs/savana.yaml

[error] 6-6: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: test sra download
GitHub Check: test primers
GitHub Check: test no candidate filtering
GitHub Check: test target regions, multiple BEDs
GitHub Check: test target regions
GitHub Check: test testcase generation
GitHub Check: test local input

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

.test/config-simple/config.yaml (1)

44-45: New savana entry added to calling configuration.
The addition of the savana section with activate: true expands the list of variant calling tools. Please verify that this configuration is consistent with the corresponding workflow rules (and the associated Conda environment in workflow/envs/savana.yaml) and that no additional parameters (such as germline_events) are needed in this file.

.test/config_primers/config.yaml (1)

46-47: Savana integration in primers configuration confirmed.
The new savana entry with activate: true is correctly introduced alongside the existing tools. Consider whether you need to include additional parameters (e.g., germline_events) for certain analyses, as seen in other configuration files, to ensure overall consistency.
workflow/rules/germline_snvs.smk (2)
1-12: Good implementation of the germline calls gathering rule.

The rule effectively uses bcftools concat with the -a parameter to allow overlapping records when gathering calls from Freebayes. The use of a pipe output is efficient as it avoids writing large intermediate files to disk.

Consider adding a resources directive to specify memory and CPU requirements, as bcftools concat can be resource-intensive for large datasets:
rule gather_germline_calls:
    input:
        calls="results/calls/{group}.freebayes.{scatteritem}.bcf",
        idx="results/calls/{group}.freebayes.{scatteritem}.bcf.csi",
    output:
        pipe("results/germline-snvs/{group}.germline_snv_candidates.bcf"),
    log:
        "logs/germline-snvs/gather-calls/{group}.log",
    params:
        extra="-a",
+   resources:
+       mem_mb=4000,
+       runtime=60
    wrapper:
        "v2.3.2/bio/bcftools/concat"
1-27: Consider adding integration with savana for SV calling as mentioned in PR title.

The PR title mentions adding savana for SV calling, but this file only contains rules for germline SNV processing. The rules look good on their own, but might be missing the integration with savana.

Would you like me to help draft the additional rule for savana integration? This would complement the existing germline SNV processing pipeline.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7e80e9 and 1ad170f.

📒 Files selected for processing (13)

.github/workflows/main.yml (2 hunks)
.test/config-chm-eval/config.yaml (1 hunks)
.test/config-giab/config.yaml (1 hunks)
.test/config-no-candidate-filtering/config.yaml (1 hunks)
.test/config-simple/config.yaml (1 hunks)
.test/config-sra/config.yaml (1 hunks)
.test/config-target-regions/config.yaml (1 hunks)
.test/config_primers/config.yaml (1 hunks)
config/config.yaml (1 hunks)
workflow/Snakefile (2 hunks)
workflow/rules/candidate_calling.smk (1 hunks)
workflow/rules/common.smk (1 hunks)
workflow/rules/germline_snvs.smk (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

.github/workflows/main.yml
workflow/Snakefile
workflow/rules/candidate_calling.smk

🔇 Additional comments (8)

workflow/rules/common.smk (1)

47-47: New variable looks good for savana integration.

The addition of germline_events variable properly handles the case when it's not defined in the configuration, defaulting to an empty list. This will be used by the new savana rule for conditional processing of germline SNVs.

.test/config-target-regions/config.yaml (1)

45-46: LGTM: Savana activation in config.

The addition of savana to the calling section with activate: true is consistent with how other variant callers are configured.

.test/config-sra/config.yaml (1)

42-43: LGTM: Savana activation in config.

The addition of savana to the calling section with activate: true is consistent with how other variant callers are configured.

.test/config-no-candidate-filtering/config.yaml (1)

45-46: LGTM: Savana activation in config.

The addition of savana to the calling section with activate: true is consistent with how other variant callers are configured.

.test/config-giab/config.yaml (1)

88-91: savana configuration with germline events added.
The new section defines savana with activate: true and adds a germline_events list containing present. Please confirm that the identifier "present" accurately represents the intended event to be used downstream in the workflow.

config/config.yaml (1)

115-120: Enhanced savana settings for CNV calling.
The savana entry now not only activates the tool but also specifies a germline_events list (with the value - germline), which is required for triggering CNV calling. Make sure that the event name "germline" matches the expected identifier in your workflow and that all related components are updated accordingly.

.test/config-chm-eval/config.yaml (1)

44-47: Savana tool activated with germline events in CHM evaluation configuration.
This section correctly activates savana and assigns a germline_events list containing present. Please double-check that the value "present" is used consistently across the configuration and workflow rules, so that the intended candidate calls are correctly processed.
workflow/rules/germline_snvs.smk (1)

1-27:
✅ Verification successful

Verify the germline_events variable is properly defined.

I noticed the rule uses germline_events as a parameter, but there's no reference to where this variable is defined.

Please ensure the germline_events variable is properly defined and available in this context, likely in a common configuration file like workflow/rules/common.smk or in a config.yaml file.

🏁 Script executed:
#!/bin/bash
# Check if germline_events is defined in common.smk or similar files
grep -r "germline_events" workflow/rules/
Length of output: 452
Verified: The germline_events variable is properly defined

I checked and confirmed that germline_events is defined in workflow/rules/common.smk via a lookup in the configuration:

In workflow/rules/common.smk:
germline_events = lookup("calling/savana/germline_events", within=config, default=[])

This definition makes it available for use in the control_fdr_germline_snvs rule, so no changes are required.

workflow/rules/germline_snvs.smk

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

workflow/envs/savana.yaml (1)

13-13: Add a newline at the end-of-file.
A newline character is missing at the end of the file. Adding one improves compatibility with various tools and adheres to best practice standards.

🧰 Tools

🪛 YAMLlint (1.35.1)

[warning] 13-13: wrong indentation: expected 6 but found 4

(indentation)

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88956f4 and 0b0d111.

📒 Files selected for processing (2)

workflow/envs/savana.yaml (1 hunks)
workflow/rules/candidate_calling.smk (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

workflow/rules/candidate_calling.smk

🧰 Additional context used

🪛 YAMLlint (1.35.1)

workflow/envs/savana.yaml

[warning] 13-13: wrong indentation: expected 6 but found 4

(indentation)

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: test sra download
GitHub Check: test primers
GitHub Check: test no candidate filtering
GitHub Check: test target regions, multiple BEDs
GitHub Check: test target regions
GitHub Check: test testcase generation
GitHub Check: test local input

workflow/envs/savana.yaml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

workflow/envs/savana.yaml (2)

5-12: Ensure Consistency in Dependency Version Specifiers.

While the dependencies are clearly declared, note that bcftools is specified as "bcftools =1.21" (using a single equals sign with spaces) whereas the other packages use == (e.g., "pandas==2.0.0"). For clarity and consistency, consider standardizing the version specifiers across all dependencies.

13-14: Pip Dependency Block: Indentation Verified & Missing Newline at EOF.

The pip dependency block appears properly indented per earlier review suggestions. However, YAMLlint highlights that there is no newline character at the end of the file. Please add a newline after line 14 to eliminate this linting error.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0d111 and 53294d0.

📒 Files selected for processing (1)

workflow/envs/savana.yaml (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.35.1)

workflow/envs/savana.yaml

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: test sra download
GitHub Check: test primers
GitHub Check: test no candidate filtering
GitHub Check: test target regions, multiple BEDs
GitHub Check: test target regions
GitHub Check: test testcase generation
GitHub Check: test local input

🔇 Additional comments (1)

workflow/envs/savana.yaml (1)

1-4: Channels Section is Configured Correctly.

The specified channels (conda-forge, bioconda, and nodefaults) are clear and well-organized.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a9d4ef and b535811.

📒 Files selected for processing (1)

workflow/envs/savana.yaml (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.35.1)

workflow/envs/savana.yaml

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: test sra download
GitHub Check: test primers
GitHub Check: test no candidate filtering
GitHub Check: test target regions, multiple BEDs
GitHub Check: test target regions
GitHub Check: test testcase generation
GitHub Check: test local input

🔇 Additional comments (1)

workflow/envs/savana.yaml (1)

13-14: Pip dependency indentation is correct.
The pip dependency block now correctly indents the nested list with six spaces, which aligns with the previous refactoring advice.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

coderabbitai · 2025-03-28T17:17:45Z

workflow/envs/savana.yaml

+  - pandas==2.0.0
+  - matplotlib==3.7.1
+  - pip:
+      - git+https://github.com/johanneskoester/savana.git@fix/empty-calls


⚠️ Potential issue

Add a newline at the end of the file.
YAML lint reports an error due to the missing newline at the end of the file. Please add a newline to ensure compliance with the linting guidelines.

@@ - - git+https://github.com/johanneskoester/savana.git@fix/empty-calls \ No newline at end of file + - git+https://github.com/johanneskoester/savana.git@fix/empty-calls +

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- git+https://github.com/johanneskoester/savana.git@fix/empty-calls

- git+https://github.com/johanneskoester/savana.git@fix/empty-calls

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 14-14: no new line character at the end of file

(new-line-at-end-of-file)

feat: add savana for SV calling

daa3b42

coderabbitai bot reviewed Mar 28, 2025

View reviewed changes

johanneskoester added 5 commits March 28, 2025 14:02

add target and test

a32739c

fix

b7e80e9

annotate access pattern

63f7ef9

CNV calling and germline event handling

9a7cdd6

add fdr parameter

1ad170f

coderabbitai bot reviewed Mar 28, 2025

View reviewed changes

workflow/rules/germline_snvs.smk Show resolved Hide resolved

johanneskoester added 2 commits March 28, 2025 14:57

extension

88956f4

use my savana fork for now

0b0d111

coderabbitai bot reviewed Mar 28, 2025

View reviewed changes

workflow/envs/savana.yaml Outdated Show resolved Hide resolved

add bcftools

53294d0

coderabbitai bot reviewed Mar 28, 2025

View reviewed changes

johanneskoester added 2 commits March 28, 2025 18:10

set threads

1a9d4ef

fix env

b535811

coderabbitai bot reviewed Mar 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add savana for SV calling #373

feat: add savana for SV calling #373

Uh oh!

johanneskoester commented Mar 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 28, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- git+https://github.com/johanneskoester/savana.git@fix/empty-calls
	- git+https://github.com/johanneskoester/savana.git@fix/empty-calls

feat: add savana for SV calling #373

Are you sure you want to change the base?

feat: add savana for SV calling #373

Uh oh!

Conversation

johanneskoester commented Mar 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johanneskoester commented Mar 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)