-
Notifications
You must be signed in to change notification settings - Fork 43
feat: integrate hla typing with orthanq #391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds optional HLA typing: new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Snakefile
participant Common as common.smk
participant HLA as hla_typing.smk
participant Conda as envs/orthanq.yaml
User->>Snakefile: run workflow
Snakefile->>Common: load config (hla_typing.activate, loci)
alt hla_typing.activate == true
Snakefile->>HLA: include HLA rules
HLA->>HLA: get_hla_genes_and_xml -> hla_gen.fasta, hla.xml.zip
HLA->>HLA: unzip_xml -> hla.xml
HLA->>Conda: run orthanq candidate & call steps (orthanq env)
HLA->>HLA: orthanq_candidate_variants (per locus) -> candidate BCFs
HLA->>HLA: scatter / gather -> per-group BCFs
HLA->>HLA: orthanq_call_hla -> results/hla-typing/{group}-{caller}/...
Common->>Common: append HLA outputs to final output list
else
Common->>Common: proceed without HLA rules/outputs
end
sequenceDiagram
autonumber
participant Common as common.smk
participant Scatter as get_scattered_calls
participant Orthanq as orthanq-<locus>
participant Variants as variant_caller
participant Fusions as arriba
Common->>Scatter: get_scattered_calls(type)
alt type == "hla-variants"
Scatter->>Orthanq: map to orthanq per-locus callers
else type == "variants"
Scatter->>Variants: map to configured variant_caller
else type == "fusions"
Scatter->>Fusions: map to arriba
else
Scatter-->>Common: raise ValueError
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
workflow/rules/hla_typing.smk (1)
39-41: Address TODOs for production readiness.The TODO comments highlight important missing functionality that should be addressed before this feature is complete:
- Additional outputs like plots
- Missing inputs and commands for the orthanq_candidate_variants rule
- Handling deactivation of biases in varlociraptor via INFO tags
Would you like me to help research the complete orthanq command-line interface and create a more complete implementation for the
orthanq_candidate_variantsrule?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
config/config.yaml(1 hunks)workflow/Snakefile(1 hunks)workflow/rules/candidate_calling.smk(1 hunks)workflow/rules/common.smk(3 hunks)workflow/rules/hla_typing.smk(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-25T12:34:09.018Z
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#338
File: workflow/rules/candidate_calling.smk:86-87
Timestamp: 2024-11-25T12:34:09.018Z
Learning: In this workflow, the grouping of variant calling rules is intentional, and not all related variant calling rules need to be grouped under "calling".
Applied to files:
workflow/rules/hla_typing.smk
🪛 GitHub Actions: Tests
workflow/Snakefile
[error] 1-1: Snakemake failed during execution. Command: snakemake --directory .test --snakefile workflow/Snakefile --show-failed-logs --conda-cleanup-pkgs cache --sdm conda --cores 10 --configfile .test/config-sra/config.yaml
workflow/rules/hla_typing.smk
[error] 1-1: RuleException: Input and output files have to be specified as strings or lists of strings.
🔇 Additional comments (8)
config/config.yaml (1)
81-82: Good integration of HLA typing feature flag.The addition of the
hla_typing.activateboolean flag follows the established pattern used throughout the config for feature toggles and provides a clean way to enable/disable HLA typing functionality.workflow/Snakefile (1)
47-47: LGTM! Proper inclusion of HLA typing rules.The addition of
include: "rules/hla_typing.smk"follows the established pattern and correctly integrates the HLA typing functionality into the workflow.workflow/rules/candidate_calling.smk (1)
75-75: Good refactoring to use centralized input selection.The change from conditional input selection to using
get_scatter_candidates_inputcentralizes the logic and supports the new HLA typing workflow. This is a clean refactoring that improves maintainability.workflow/rules/common.smk (3)
713-723: Excellent pattern matching implementation for caller selection.The use of structural pattern matching with explicit handling for "fusions", "variants", and the new "hla-variants" case is clean and follows Python best practices. The addition of "orthanq" as the caller for "hla-variants" aligns well with Orthanq being a statistical model for transparent and uncertainty-aware quantification of haplotypes used for HLA typing.
773-779: Well-designed helper function for candidate input selection.The
get_scatter_candidates_inputfunction provides a clean abstraction for determining the appropriate input source based on the caller type and configuration. The logic correctly handles:
- orthanq-specific candidate files
- target region filtering
- fallback to fixed candidate calls
This design supports the integration of HLA typing while maintaining backward compatibility.
166-172: HLA typing output integration verified. The expand call in workflow/rules/common.smk (results/hla-typing/{item.group}.{item.sample}.tsv) matches the orthanq_call rule output in workflow/rules/hla_typing.smk (results/hla-typing/{group}.{sample}.tsv).workflow/rules/hla_typing.smk (2)
12-25: LGTM! Well-structured rule for gathering HLA variant calls.The
gather_annotated_callsrule correctly:
- Uses the scatter-gather pattern with
gather.calling()- Follows established patterns from other gather rules in the codebase
- Uses the appropriate bcftools/concat wrapper
- Includes proper logging and grouping
28-36: LGTM! Proper orthanq call rule structure.The
orthanq_callrule correctly:
- Takes the gathered HLA variants as input
- Uses sample-specific wildcards for personalized HLA typing
- Outputs to the expected TSV format that matches the final output pattern
- Includes proper logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (1)
workflow/rules/hla_typing.smk (1)
1-14: Log both downloads and prefer HTTPS if available.Currently only the second wget redirects stderr to log. Make logging consistent. If HTTPS endpoints exist for IMGT/HLA, prefer them.
Apply:
- shell: - "wget -c {params.genes_link} -O {output.genes} && " - "wget -c {params.xml_link} -O {output.xml} 2> {log}" + shell: + "(wget -c {params.genes_link} -O {output.genes} 2> {log}) && " + "(wget -c {params.xml_link} -O {output.xml} 2>> {log})"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
config/config.yaml(1 hunks)workflow/envs/orthanq.yaml(1 hunks)workflow/rules/common.smk(3 hunks)workflow/rules/hla_typing.smk(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- workflow/envs/orthanq.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- config/config.yaml
🧰 Additional context used
🪛 GitHub Actions: Tests
workflow/rules/hla_typing.smk
[error] 35-35: KeyError in file "/home/runner/work/dna-seq-varlociraptor/dna-seq-varlociraptor/workflow/rules/hla_typing.smk", line 35: 'hla_typing'
🔇 Additional comments (2)
workflow/rules/hla_typing.smk (1)
64-84: Conda env reference check.Ensure
../envs/orthanq.yamlexists and pins a compatible orthanq version; otherwise, Snakemake will fail at runtime.workflow/rules/common.smk (1)
774-781: Fix get_scatter_candidates_input return type and orthanq path pattern
- Call the inner function in the default branch:
return get_fixed_candidate_calls("bcf")(wildcards).- For OrthAnQ, return a pattern matching the scatter wildcard:
def get_scatter_candidates_input(wildcards): if wildcards.caller == "orthanq": - return "results/candidate-calls/orthanq.bcf" + return "results/candidate-calls/orthanq.{scatteritem}.bcf" elif config.get("target_regions"): return "results/candidate-calls/{group}.{caller}.filtered.bcf" else: - return get_fixed_candidate_calls("bcf") + return get_fixed_candidate_calls("bcf")(wildcards)Verify that the scatter rule’s wildcard name (
scatteritem) matches the actual scatter dimension (e.g.locus); adjust the placeholder if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
workflow/rules/common.smk (1)
1008-1016: Add missing 'orthanq' to caller and 'hla-variants' to calling_type (workflow/rules/common.smk)Apply:
workflow/rules/common.smk @@ -1008,7 +1008,7 @@ wildcard_constraints: - caller="|".join(["freebayes", "delly", "arriba"]), + caller="|".join(["freebayes", "delly", "arriba", "orthanq"]), @@ -1014,7 +1014,7 @@ wildcard_constraints: - calling_type="|".join(["fusions", "variants"]), + calling_type="|".join(["fusions", "variants", "hla-variants"]),
🧹 Nitpick comments (1)
workflow/rules/common.smk (1)
175-258: Ensure correct group set for new calling_type “hla-variants”.Blocks selecting group fallback to fusions_groups for any non-“variants” type. That will route “hla-variants” to the wrong groups. Treat “hla-variants” like “variants”.
Apply at each occurrence shown below:
- group=( - variants_groups - if calling_type == "variants" - else fusions_groups - ), + group=( + variants_groups + if calling_type in ["variants", "hla-variants"] + else fusions_groups + ),Repeat for the same pattern in lines 205-216, 218-231, 233-257, and 247-257.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/rules/common.smk(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-25T12:34:09.018Z
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#338
File: workflow/rules/candidate_calling.smk:86-87
Timestamp: 2024-11-25T12:34:09.018Z
Learning: In this workflow, the grouping of variant calling rules is intentional, and not all related variant calling rules need to be grouped under "calling".
Applied to files:
workflow/rules/common.smk
🔇 Additional comments (1)
workflow/rules/common.smk (1)
714-725: Good mapping for “hla-variants” → “orthanq”, but update wildcard constraints.The match/case looks fine. However, without adding “orthanq” to caller and “hla-variants” to calling_type in wildcard_constraints, scheduling will fail.
See concrete fix suggested on Lines 1008-1016.
…e for scattering orthanq candidate calls, extend wildcard constraints, configure input function of varlociraptor preprocess to retrieve orthanq candidate calls and fix final output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
workflow/rules/hla_typing.smk (2)
1-13: Fix typo and capture all wget output in logs.The comment has a typo and only the second wget command's stderr is captured.
-# dowloads latest IMGT/HLA database version +# downloads latest IMGT/HLA database version rule get_hla_genes_and_xml: output: genes="results/preparation/hla_gen.fasta", xml="results/preparation/hla.xml.zip", log: "logs/get_hla_genes_and_xml.log", params: genes_link="ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla_gen.fasta", xml_link="ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/xml/hla.xml.zip", shell: - "wget -c {params.genes_link} -O {output.genes} && " - "wget -c {params.xml_link} -O {output.xml} 2> {log}" + "(wget -c {params.genes_link} -O {output.genes} && " + "wget -c {params.xml_link} -O {output.xml}) 2> {log}"
105-107: Address TODO items for complete HLA typing workflow.The TODOs indicate missing functionality:
- Additional outputs (plots) are not generated
- Handling of bias deactivation in varlociraptor needs to be decided
Consider implementing these features or creating GitHub issues to track them.
Would you like me to help implement the missing plot generation or create issues to track these TODOs?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/rules/common.smk(8 hunks)workflow/rules/hla_typing.smk(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/rules/common.smk
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-25T12:34:09.018Z
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#338
File: workflow/rules/candidate_calling.smk:86-87
Timestamp: 2024-11-25T12:34:09.018Z
Learning: In this workflow, the grouping of variant calling rules is intentional, and not all related variant calling rules need to be grouped under "calling".
Applied to files:
workflow/rules/hla_typing.smk
🪛 GitHub Actions: Tests
workflow/rules/hla_typing.smk
[error] 37-37: KeyError in hla_typing workflow rule: the key 'hla_typing' is not found in the expected dictionary. Snakemake exited with code 1 while evaluating the workflow (command: snakemake --directory .test --snakefile workflow/Snakefile --show-failed-logs --conda-cleanup-pkgs cache --sdm conda --cores 10 results/testcases/one/freebayes/IX:314200 --configfile .test/config-simple/config.yaml).
🔇 Additional comments (3)
workflow/rules/hla_typing.smk (3)
35-38: Fix KeyError by using safe config access.The rule fails with KeyError when
hla_typingconfig section is missing. Use safe dictionary access to handle missing configuration gracefully.output: bcfs=expand( "results/orthanq-candidate-calls/{caller}.hla-variants.bcf", - caller=[f"orthanq-{locus}" for locus in config["hla_typing"].get("loci")], + caller=[f"orthanq-{locus}" for locus in config.get("hla_typing", {}).get("loci", [])], ),
1-1: Add missing import for os module.The
osmodule is used in lines 24 and 44 but never imported.Add this import at the top of the file:
+import os + # dowloads latest IMGT/HLA database version
16-26: Redirect unzip stderr to log file.The unzip command should capture stderr for debugging failed extractions.
shell: - "unzip -o {input} -d {params.path_to_unzip}" + "unzip -o {input} -d {params.path_to_unzip} 2> {log}"Based on learnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
workflow/rules/common.smk (4)
721-734: Reuse orthanq_callers for hla-variants to avoid re-parsing config.Keeps behavior consistent and benefits from earlier normalization.
- case "hla-variants": - caller = [ - f"orthanq-{locus}" for locus in config["hla_typing"].get("loci", []) - ] + case "hla-variants": + caller = orthanq_callers
783-791: Remove dead commented code block or add a note if intentionally kept.The old get_scatter_candidates_input is commented out. Prefer deleting to reduce noise. If you plan to revive it, ensure the else branch calls get_fixed_candidate_calls("bcf")(wildcards) instead of returning the function object.
1526-1533: Clamp -T to a sane minimum to avoid negative/zero thresholds.If min_primer_len is very small, min_primer_len-2 can go below 0; clamp to ≥2.
- if min_primer_len < 32: - extra += f" -T {min_primer_len-2}" + if min_primer_len < 32: + extra += f" -T {max(2, min_primer_len - 2)}"
792-802: Use exact membership check againstorthanq_callers
Replace the prefix-based check with a membership test to avoid false positives:- if is_activated("hla_typing") and wc.caller.startswith("orthanq-"): + if is_activated("hla_typing") and wc.caller in orthanq_callers: return "results/orthanq-candidate-calls/{caller}.hla-variants.{scatteritem}.bcf"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/rules/common.smk(8 hunks)workflow/rules/hla_typing.smk(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/rules/hla_typing.smk
🔇 Additional comments (4)
workflow/rules/common.smk (4)
995-998: LGTM: candidate filter aux files list comprehension is correct.
1006-1009: LGTM: candidate filter aux args zip aligns with aux files.Ensure get_filter_aux_entries("candidates") and input.aux have equal length in the rule wiring.
1025-1033: Add 'hla-variants' to wildcard_constraints.calling_type.Without this, rules using calling_type="hla-variants" will violate constraints and fail. This mirrors the new branch in get_scattered_calls.
wildcard_constraints: @@ - calling_type="|".join(["fusions", "variants"]), + calling_type="|".join(["fusions", "variants", "hla-variants"]),Based on learnings
170-181: Use the normalized orthanq_callers and guard missing loci.This repeats the raw config access and will break if loci is missing or a string. Reuse orthanq_callers.
- locus=[ - f"orthanq-{locus}" for locus in config["hla_typing"].get("loci") - ], + locus=orthanq_callers,Based on learnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.test/config_primers/config.yaml (1)
41-47: LGTM. Thehla_typingblock is well-structured and correctly disabled (activate: false) in this primers-only test config; no existing.test/config enables HLA typing—consider adding a dedicated test config withhla_typing.activate: trueto validate the new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
workflow/rules/common.smk (1)
1032-1032: Add "hla-variants" tocalling_typewildcard constraint.Line 721-733 handle
calling_type="hla-variants"in the match/case block, but line 1032 still only allows "fusions" and "variants". This will cause constraint errors.Apply this diff:
- calling_type="|".join(["fusions", "variants"]), + calling_type="|".join(["fusions", "variants", "hla-variants"]),
♻️ Duplicate comments (5)
workflow/rules/hla_typing.smk (3)
26-26: Redirect stderr to log file.The
unzipcommand should redirect its output to the log file for proper error tracking.Apply this diff:
- "unzip -o {input} -d {params.path_to_unzip}" + "unzip -o {input} -d {params.path_to_unzip} 2> {log}"
35-38: Safeguard against missinghla_typingconfig.Line 37 accesses
config["hla_typing"]directly, which will raise aKeyErrorif the section is missing or HLA typing is deactivated. Use defensive access.Apply this diff:
bcfs=expand( "results/orthanq-candidate-calls/{caller}.hla-variants.bcf", - caller=[f"orthanq-{locus}" for locus in config["hla_typing"].get("loci")], + caller=[f"orthanq-{locus}" for locus in config.get("hla_typing", {}).get("loci", [])], ),
1-103: Add missingosimport at the top of the file.Line 24 uses
os.path.dirnameand line 44 usesos.path.dirname, but theosmodule is not imported. This will cause aNameErrorat runtime.Add this import at the top of the file:
+import os + # dowloads latest IMGT/HLA database version rule get_hla_genes_and_xml:workflow/rules/common.smk (2)
47-50: Normalizehla_locito prevent character-wise iteration.If
config["hla_typing"]["loci"]is a string rather than a list, the list comprehension on line 49 will iterate over individual characters, producing callers likeorthanq-A,orthanq-B, etc. (one per character) instead oforthanq-HLA-A, etc.Based on learnings
Apply this diff to normalize and deduplicate:
# hla typing -hla_loci = config.get("hla_typing", {}).get("loci", []) -orthanq_callers = [f"orthanq-{locus}" for locus in hla_loci] +_raw_loci = config.get("hla_typing", {}).get("loci", []) or [] +hla_loci = [_raw_loci] if isinstance(_raw_loci, str) else list(_raw_loci) +# ensure stable order and uniqueness +hla_loci = list(dict.fromkeys(hla_loci)) +orthanq_callers = [f"orthanq-{locus}" for locus in hla_loci]
170-181: Safeguard loci lookup and normalize before passing toexpand.If
config["hla_typing"]["loci"]is missing, None, or a string, theexpandcall will fail. Fetch and normalize loci before using them.Based on learnings
Apply this diff:
if is_activated("hla_typing"): - loci = [f"orthanq-{locus}" for locus in config["hla_typing"].get("loci", [])] + _raw_loci = config.get("hla_typing", {}).get("loci", []) or [] + loci_list = [_raw_loci] if isinstance(_raw_loci, str) else list(_raw_loci) + loci = [f"orthanq-{locus}" for locus in loci_list] for group in samples["group"].unique(): final_output.extend( expand( "results/hla-typing/{group}-{locus}/{alias}/{alias}-{locus}.csv", alias=get_group_aliases(group), group=group, locus=loci, ) )
🧹 Nitpick comments (1)
workflow/rules/hla_typing.smk (1)
105-107: Address TODO comments before merging.The TODO comments indicate missing functionality (plots, inputs, commands) and unresolved design decisions about handling varlociraptor biases. Consider opening issues to track these items.
Would you like me to help generate tracking issues for these TODOs?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/rules/common.smk(8 hunks)workflow/rules/hla_typing.smk(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: test target regions, multiple BEDs
- GitHub Check: test no candidate filtering
- GitHub Check: test testcase generation
- GitHub Check: test primers
- GitHub Check: test sra download
- GitHub Check: test local input
- GitHub Check: test target regions
🔇 Additional comments (10)
workflow/rules/common.smk (5)
792-801: LGTM onget_candidate_callsorthanq routing.The conditional correctly routes orthanq callers to their specific candidate path when HLA typing is activated and the caller starts with "orthanq-".
721-734: LGTM onget_scattered_callsmatch/case structure.The match/case block properly handles the three calling types ("fusions", "variants", "hla-variants") and raises a clear error for unexpected values. The orthanq caller list is correctly constructed from configured loci.
995-997: Minor formatting change is acceptable.The list comprehension reformatting improves readability without changing logic.
1006-1009: Minor formatting change is acceptable.The
zipcall reformatting improves readability without changing logic.
1530-1530: Minor spacing change is acceptable.The spacing adjustment in the arithmetic expression
min_primer_len-2tomin_primer_len - 2improves readability.workflow/rules/hla_typing.smk (5)
2-13: LGTM onget_hla_genes_and_xmlrule.The rule correctly downloads the latest IMGT/HLA database files using wget with proper output paths and logging.
29-48: LGTM onorthanq_candidate_variantscommand structure.The orthanq candidates command is properly structured with all required inputs and parameters. The multi-threaded execution and BCF output format are appropriately configured.
52-64: LGTM onscatter_candidates_orthanqrule.The rule correctly uses
rbt vcf-splitto scatter candidate BCFs per locus without requiring group wildcards.
67-80: LGTM ongather_annotated_calls_orthanqrule.The gather pattern now correctly aligns with the common scheme using
results/calls/{{group}}.{{caller}}.{scatteritem}.bcfinstead of including the locus wildcard. This matches the expected pattern fromget_scattered_callsin common.smk.
83-102: LGTM onorthanq_callrule.The rule correctly invokes
orthanq call hlawith all required inputs and produces multiple output files. The wildcard usage is now correct (no typo in haplotype_calls path).
|
Just a meta-comment without having looked into the details: Wouldn't we maybe want to make this a little workflow on its own and only include this as a module here? Just to avoid further complicating this workflow's tool dependencies? |
|
Initially, yes, that was the main idea but now we mainly want to skip repeating both the mapping and Varlociraptor steps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
workflow/rules/hla_typing.smk (1)
16-26: Add missing os import and redirect stderr to log.This rule uses
os.path.dirnamewithout importingosat the top of the file, which will cause aNameError. Additionally, the unzip command should redirect its output to the log.Apply this diff at the top of the file:
+import os + # dowloads latest IMGT/HLA database version rule get_hla_genes_and_xml:And update the shell command:
- "unzip -o {input} -d {params.path_to_unzip}" + "unzip -o {input} -d {params.path_to_unzip} 2> {log}"Based on learnings
workflow/rules/common.smk (1)
1028-1028: Include “hla-variants” in calling_type wildcard constraints
In workflow/rules/common.smk (around line 1032), the calling_type pattern currently joins ["fusions","variants"]; add "hla-variants" to this list.
🧹 Nitpick comments (4)
workflow/rules/common.smk (1)
783-790: Remove or uncomment dead code.This commented function appears to be an earlier implementation of scatter candidate input routing. Either uncomment and integrate it if needed, or remove it to keep the codebase clean.
workflow/rules/hla_typing.smk (3)
47-55: Consider using symlink instead of copy for efficiency.Using
cpcreates a full copy of the BCF file. If these files are large, consider using a symbolic link instead.- "cp {input}/{wildcards.locus}.bcf {output}" + "ln -sf $(realpath {input}/{wildcards.locus}.bcf) {output}"This saves disk space and is faster, though it requires the temp directory to exist until all downstream rules complete.
58-71: Add index output for downstream compatibility.BCF files typically need index files (
.csi) for efficient access. The scatter operation should produce both the BCF and its index.rule scatter_candidates_orthanq: input: "results/orthanq-candidate-calls/{orthanq_locus}.hla-variants.bcf", output: scatter.calling( "results/orthanq-candidate-calls/{{orthanq_locus}}.hla-variants.{scatteritem}.bcf" ), + scatter.calling( + "results/orthanq-candidate-calls/{{orthanq_locus}}.hla-variants.{scatteritem}.bcf.csi" + ),However, verify if
rbt vcf-splitautomatically creates indexes.
107-109: Address TODO comments before merging.The TODO comments indicate incomplete implementation. Consider whether these should be addressed in this PR or tracked in a separate issue.
Do you want me to open a GitHub issue to track these TODO items, or should they be addressed in this PR?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
workflow/rules/common.smk(8 hunks)workflow/rules/hla_typing.smk(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: test sra download
- GitHub Check: test testcase generation
- GitHub Check: test target regions
- GitHub Check: test target regions, multiple BEDs
- GitHub Check: test local input
- GitHub Check: test primers
- GitHub Check: test no candidate filtering
🔇 Additional comments (7)
workflow/rules/common.smk (5)
792-801: Approve orthanq routing with minor observation.The logic correctly routes orthanq callers to a different candidate path. The
startswith("orthanq-")check is appropriate given the naming convention.One minor note: The function returns early for orthanq but the subsequent
ifat line 798 doesn't have anelif, which is fine but slightly less explicit about mutual exclusivity.
995-997: Formatting change looks good.The list comprehension is more readable on multiple lines when it gets long.
1006-1009: Formatting change looks good.Consistent with the change at lines 995-997.
1530-1530: Formatting change looks good.The space removal around the subtraction doesn't affect correctness and follows Python style.
721-734: LGTM but validate config['hla_typing'].loci
Ensureconfig["hla_typing"].get("loci")returns a list of strings (not a single string) to avoid iterating over characters when building HLA variant callers.workflow/rules/hla_typing.smk (2)
74-87: LGTM!The gather rule correctly uses the gather helper and includes both BCF and CSI inputs, which is good practice for BCF operations.
90-106: Verify orthanq wrapper version and directory output handling
Thev7.6.2/bio/orthanqwrapper tag wasn't found—confirm the tag exists in the snakemake-wrappers repo and that this wrapper supports directory outputs.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…-varlociraptor into feat/orthanq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
workflow/rules/trimming.smk (1)
8-8: Wrapper bump is valid, no breaking changes detected.
The v7.7.0 tag exists and the bio/sra-tools/fasterq-dump wrapper remains compatible. Since this update seems unrelated to the HLA typing integration, consider moving it into a standalone maintenance PR.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/rules/trimming.smk(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: test no candidate filtering
- GitHub Check: test local input
- GitHub Check: test testcase generation
- GitHub Check: test target regions, multiple BEDs
- GitHub Check: test primers
- GitHub Check: test target regions
- GitHub Check: test sra download
Summary by CodeRabbit
New Features
Chores
Notes