-
Notifications
You must be signed in to change notification settings - Fork 43
feat: add savana for SV calling #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThis update introduces a new Conda environment configuration file, Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as Workflow Scheduler
participant Rule as "savana Rule"
participant Env as Conda Environment (savana.yaml)
participant Tool as savana Tool
participant Log as Logging System
participant Result as Results Directory
Scheduler->>Rule: Trigger savana rule
Rule->>Env: Load environment configuration
Rule->>Rule: Validate inputs (ref, ref_idx, aln, index)
Rule->>Tool: Execute savana command with parameters
Tool-->>Log: Redirect errors to log file
Tool-->>Result: Write output BCF file
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
workflow/envs/savana.yaml (1)
1-6: LGTM: Environment configuration looks goodThe Conda environment configuration correctly specifies the required channels and the savana package with version 1.3.2.
Consider adding a newline at the end of the file to address the YAML linting warning.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 6-6: no new line character at the end of file
(new-line-at-end-of-file)
workflow/rules/candidate_calling.smk (1)
26-43: Savana rule implementation looks good, with some minor suggestionsThe rule is well structured with appropriate inputs, outputs, and resource allocation. The shadow directive is a good practice to isolate the working directory.
Consider the following improvements:
- The shell command relies on
*_sv_breakpoints.vcfpattern matching exactly one file:- "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&" - " mv *_sv_breakpoints.vcf {output}) 2> {log}" + "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&" + " find . -name '*_sv_breakpoints.vcf' -exec mv {{}} {output} \\;) 2> {log}"
- The output file has a
.bcfextension but the command moves a.vcffile. Consider explicitly converting to BCF format:- "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&" - " mv *_sv_breakpoints.vcf {output}) 2> {log}" + "(savana to --tumour {input.aln} --ref {input.ref} --outdir . &&" + " bcftools view -Ob *_sv_breakpoints.vcf > {output}) 2> {log}"
- Consider dynamic thread allocation similar to other rules instead of hardcoding 8 threads.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/envs/savana.yaml(1 hunks)workflow/rules/candidate_calling.smk(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml
[error] 6-6: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test sra download
- GitHub Check: test primers
- GitHub Check: test no candidate filtering
- GitHub Check: test target regions, multiple BEDs
- GitHub Check: test target regions
- GitHub Check: test testcase generation
- GitHub Check: test local input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
.test/config-simple/config.yaml (1)
44-45: Newsavanaentry added to calling configuration.
The addition of thesavanasection withactivate: trueexpands the list of variant calling tools. Please verify that this configuration is consistent with the corresponding workflow rules (and the associated Conda environment inworkflow/envs/savana.yaml) and that no additional parameters (such asgermline_events) are needed in this file..test/config_primers/config.yaml (1)
46-47: Savana integration in primers configuration confirmed.
The newsavanaentry withactivate: trueis correctly introduced alongside the existing tools. Consider whether you need to include additional parameters (e.g.,germline_events) for certain analyses, as seen in other configuration files, to ensure overall consistency.workflow/rules/germline_snvs.smk (2)
1-12: Good implementation of the germline calls gathering rule.The rule effectively uses
bcftools concatwith the-aparameter to allow overlapping records when gathering calls from Freebayes. The use of a pipe output is efficient as it avoids writing large intermediate files to disk.Consider adding a resources directive to specify memory and CPU requirements, as bcftools concat can be resource-intensive for large datasets:
rule gather_germline_calls: input: calls="results/calls/{group}.freebayes.{scatteritem}.bcf", idx="results/calls/{group}.freebayes.{scatteritem}.bcf.csi", output: pipe("results/germline-snvs/{group}.germline_snv_candidates.bcf"), log: "logs/germline-snvs/gather-calls/{group}.log", params: extra="-a", + resources: + mem_mb=4000, + runtime=60 wrapper: "v2.3.2/bio/bcftools/concat"
1-27: Consider adding integration with savana for SV calling as mentioned in PR title.The PR title mentions adding savana for SV calling, but this file only contains rules for germline SNV processing. The rules look good on their own, but might be missing the integration with savana.
Would you like me to help draft the additional rule for savana integration? This would complement the existing germline SNV processing pipeline.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
.github/workflows/main.yml(2 hunks).test/config-chm-eval/config.yaml(1 hunks).test/config-giab/config.yaml(1 hunks).test/config-no-candidate-filtering/config.yaml(1 hunks).test/config-simple/config.yaml(1 hunks).test/config-sra/config.yaml(1 hunks).test/config-target-regions/config.yaml(1 hunks).test/config_primers/config.yaml(1 hunks)config/config.yaml(1 hunks)workflow/Snakefile(2 hunks)workflow/rules/candidate_calling.smk(1 hunks)workflow/rules/common.smk(1 hunks)workflow/rules/germline_snvs.smk(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- .github/workflows/main.yml
- workflow/Snakefile
- workflow/rules/candidate_calling.smk
🔇 Additional comments (8)
workflow/rules/common.smk (1)
47-47: New variable looks good for savana integration.The addition of
germline_eventsvariable properly handles the case when it's not defined in the configuration, defaulting to an empty list. This will be used by the new savana rule for conditional processing of germline SNVs..test/config-target-regions/config.yaml (1)
45-46: LGTM: Savana activation in config.The addition of savana to the calling section with
activate: trueis consistent with how other variant callers are configured..test/config-sra/config.yaml (1)
42-43: LGTM: Savana activation in config.The addition of savana to the calling section with
activate: trueis consistent with how other variant callers are configured..test/config-no-candidate-filtering/config.yaml (1)
45-46: LGTM: Savana activation in config.The addition of savana to the calling section with
activate: trueis consistent with how other variant callers are configured..test/config-giab/config.yaml (1)
88-91:savanaconfiguration with germline events added.
The new section definessavanawithactivate: trueand adds agermline_eventslist containingpresent. Please confirm that the identifier"present"accurately represents the intended event to be used downstream in the workflow.config/config.yaml (1)
115-120: Enhanced savana settings for CNV calling.
Thesavanaentry now not only activates the tool but also specifies agermline_eventslist (with the value- germline), which is required for triggering CNV calling. Make sure that the event name"germline"matches the expected identifier in your workflow and that all related components are updated accordingly..test/config-chm-eval/config.yaml (1)
44-47: Savana tool activated with germline events in CHM evaluation configuration.
This section correctly activatessavanaand assigns agermline_eventslist containingpresent. Please double-check that the value"present"is used consistently across the configuration and workflow rules, so that the intended candidate calls are correctly processed.workflow/rules/germline_snvs.smk (1)
1-27:✅ Verification successful
Verify the germline_events variable is properly defined.
I noticed the rule uses
germline_eventsas a parameter, but there's no reference to where this variable is defined.Please ensure the
germline_eventsvariable is properly defined and available in this context, likely in a common configuration file likeworkflow/rules/common.smkor in a config.yaml file.
🏁 Script executed:
#!/bin/bash # Check if germline_events is defined in common.smk or similar files grep -r "germline_events" workflow/rules/Length of output: 452
Verified: The
germline_eventsvariable is properly definedI checked and confirmed that
germline_eventsis defined inworkflow/rules/common.smkvia a lookup in the configuration:
- In
workflow/rules/common.smk:
germline_events = lookup("calling/savana/germline_events", within=config, default=[])This definition makes it available for use in the
control_fdr_germline_snvsrule, so no changes are required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
workflow/envs/savana.yaml (1)
13-13: Add a newline at the end-of-file.
A newline character is missing at the end of the file. Adding one improves compatibility with various tools and adheres to best practice standards.🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 13-13: wrong indentation: expected 6 but found 4
(indentation)
[error] 13-13: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/envs/savana.yaml(1 hunks)workflow/rules/candidate_calling.smk(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/rules/candidate_calling.smk
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml
[warning] 13-13: wrong indentation: expected 6 but found 4
(indentation)
[error] 13-13: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test sra download
- GitHub Check: test primers
- GitHub Check: test no candidate filtering
- GitHub Check: test target regions, multiple BEDs
- GitHub Check: test target regions
- GitHub Check: test testcase generation
- GitHub Check: test local input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
workflow/envs/savana.yaml (2)
5-12: Ensure Consistency in Dependency Version Specifiers.While the dependencies are clearly declared, note that
bcftoolsis specified as"bcftools =1.21"(using a single equals sign with spaces) whereas the other packages use==(e.g.,"pandas==2.0.0"). For clarity and consistency, consider standardizing the version specifiers across all dependencies.
13-14: Pip Dependency Block: Indentation Verified & Missing Newline at EOF.The pip dependency block appears properly indented per earlier review suggestions. However, YAMLlint highlights that there is no newline character at the end of the file. Please add a newline after line 14 to eliminate this linting error.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/envs/savana.yaml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test sra download
- GitHub Check: test primers
- GitHub Check: test no candidate filtering
- GitHub Check: test target regions, multiple BEDs
- GitHub Check: test target regions
- GitHub Check: test testcase generation
- GitHub Check: test local input
🔇 Additional comments (1)
workflow/envs/savana.yaml (1)
1-4: Channels Section is Configured Correctly.The specified channels (
conda-forge,bioconda, andnodefaults) are clear and well-organized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/envs/savana.yaml(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
workflow/envs/savana.yaml
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test sra download
- GitHub Check: test primers
- GitHub Check: test no candidate filtering
- GitHub Check: test target regions, multiple BEDs
- GitHub Check: test target regions
- GitHub Check: test testcase generation
- GitHub Check: test local input
🔇 Additional comments (1)
workflow/envs/savana.yaml (1)
13-14: Pip dependency indentation is correct.
The pip dependency block now correctly indents the nested list with six spaces, which aligns with the previous refactoring advice.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
| - pandas==2.0.0 | ||
| - matplotlib==3.7.1 | ||
| - pip: | ||
| - git+https://github.com/johanneskoester/savana.git@fix/empty-calls No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline at the end of the file.
YAML lint reports an error due to the missing newline at the end of the file. Please add a newline to ensure compliance with the linting guidelines.
@@
- - git+https://github.com/johanneskoester/savana.git@fix/empty-calls
\ No newline at end of file
+ - git+https://github.com/johanneskoester/savana.git@fix/empty-calls
+ 📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - git+https://github.com/johanneskoester/savana.git@fix/empty-calls | |
| - git+https://github.com/johanneskoester/savana.git@fix/empty-calls |
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
Summary by CodeRabbit