Skip to content

Conversation

@johanneskoester
Copy link
Contributor

@johanneskoester johanneskoester commented Sep 20, 2025

Summary by CodeRabbit

  • New Features

    • Optional HLA typing pipeline (disabled by default); configurable loci (A, B, C, DQB1).
    • Workflow now includes HLA-typing rules: reference download/unpack, per‑locus candidate generation, scatter/gather and consolidated typing outputs.
  • Chores

    • Added dedicated conda environment pinning orthanq 1.15.0.
    • Updated sra-tools wrapper to a newer version.
    • CI matrix cleanup flag adjusted for SRA download case.
  • Notes

    • No behavior changes unless HLA typing is activated.

@coderabbitai
Copy link

coderabbitai bot commented Sep 20, 2025

Walkthrough

Adds optional HLA typing: new hla_typing config block and test fixtures, includes rules/hla_typing.smk in the Snakefile, extends common workflow wiring for orthanq callers and outputs, adds HLA-specific Snakemake rules and a conda env for orthanq, and appends HLA outputs to final outputs when enabled.

Changes

Cohort / File(s) Summary
Configuration: HLA typing
config/config.yaml, .test/config_primers/config.yaml, .test/config-chm-eval/config.yaml, .test/config-giab/config.yaml, .test/config-no-candidate-filtering/config.yaml, .test/config-simple/config.yaml, .test/config-sra/config.yaml, .test/config-target-regions/config.yaml, .test/config-target-regions/config_multiple_beds.yaml
Adds top-level hla_typing block with activate (bool, default false) and loci (list, e.g., ["A","B","C","DQB1"]) to configuration and test fixtures.
Workflow assembly
workflow/Snakefile
Adds include: "rules/hla_typing.smk" so HLA rules are loaded during workflow assembly.
Common utilities and wiring
workflow/rules/common.smk
Adds public variables hla_loci and orthanq_callers; extends wildcard caller constraints to include orthanq callers; updates get_final_output to append per-group per-locus HLA outputs when enabled; refactors get_scattered_calls mapping and routes orthanq-* callers in get_candidate_calls; minor formatting tweaks.
HLA typing rules
workflow/rules/hla_typing.smk
Adds rules: get_hla_genes_and_xml, unzip_xml, orthanq_candidate_variants, rename_candidate_bcf, scatter_candidates_orthanq, gather_annotated_calls_orthanq, orthanq_call_hla — implements download/unzip, per-locus candidate generation, scatter/gather, and orthanq call outputs (per group/locus).
Conda environment for orthanq
workflow/envs/orthanq.yaml
New conda env file specifying channels conda-forge, bioconda and orthanq=1.15.0.
Tool update
workflow/rules/trimming.smk
Bumps sra-tools wrapper version used by get_sra from v5.0.2 to v7.7.0 (fasterq-dump wrapper).
CI workflow tweak
.github/workflows/main.yml
Sets cleanup flag true for the "sra download" matrix case so Free Disk Space step (and cleanup path) runs for that case.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Snakefile
  participant Common as common.smk
  participant HLA as hla_typing.smk
  participant Conda as envs/orthanq.yaml

  User->>Snakefile: run workflow
  Snakefile->>Common: load config (hla_typing.activate, loci)
  alt hla_typing.activate == true
    Snakefile->>HLA: include HLA rules
    HLA->>HLA: get_hla_genes_and_xml -> hla_gen.fasta, hla.xml.zip
    HLA->>HLA: unzip_xml -> hla.xml
    HLA->>Conda: run orthanq candidate & call steps (orthanq env)
    HLA->>HLA: orthanq_candidate_variants (per locus) -> candidate BCFs
    HLA->>HLA: scatter / gather -> per-group BCFs
    HLA->>HLA: orthanq_call_hla -> results/hla-typing/{group}-{caller}/...
    Common->>Common: append HLA outputs to final output list
  else
    Common->>Common: proceed without HLA rules/outputs
  end
Loading
sequenceDiagram
  autonumber
  participant Common as common.smk
  participant Scatter as get_scattered_calls
  participant Orthanq as orthanq-<locus>
  participant Variants as variant_caller
  participant Fusions as arriba

  Common->>Scatter: get_scattered_calls(type)
  alt type == "hla-variants"
    Scatter->>Orthanq: map to orthanq per-locus callers
  else type == "variants"
    Scatter->>Variants: map to configured variant_caller
  else type == "fusions"
    Scatter->>Fusions: map to arriba
  else
    Scatter-->>Common: raise ValueError
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • dlaehnemann

Poem

I nibble keys and twitch my ear,
New HLA lanes then hoppity-cheer.
Fetch, unzip, split, and call with grace,
Loci hop and scatter through the place.
A rabbit's thump — outputs fall in place 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately and succinctly describes the key feature of the pull request—integrating HLA typing with the OrthAnQ tool—providing clear context without extraneous detail.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/orthanq

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ee2287 and 2050598.

📒 Files selected for processing (1)
  • .github/workflows/main.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: test target regions
  • GitHub Check: test testcase generation
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test sra download
  • GitHub Check: test primers
  • GitHub Check: test no candidate filtering
  • GitHub Check: test local input
🔇 Additional comments (1)
.github/workflows/main.yml (1)

63-63: Change looks good

Enabling cleanup for the SRA download case matches the resource-heavy nature of that scenario and ensures the free-disk-space step runs when needed.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
workflow/rules/hla_typing.smk (1)

39-41: Address TODOs for production readiness.

The TODO comments highlight important missing functionality that should be addressed before this feature is complete:

  1. Additional outputs like plots
  2. Missing inputs and commands for the orthanq_candidate_variants rule
  3. Handling deactivation of biases in varlociraptor via INFO tags

Would you like me to help research the complete orthanq command-line interface and create a more complete implementation for the orthanq_candidate_variants rule?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee49828 and 1a8c8f6.

📒 Files selected for processing (5)
  • config/config.yaml (1 hunks)
  • workflow/Snakefile (1 hunks)
  • workflow/rules/candidate_calling.smk (1 hunks)
  • workflow/rules/common.smk (3 hunks)
  • workflow/rules/hla_typing.smk (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-25T12:34:09.018Z
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#338
File: workflow/rules/candidate_calling.smk:86-87
Timestamp: 2024-11-25T12:34:09.018Z
Learning: In this workflow, the grouping of variant calling rules is intentional, and not all related variant calling rules need to be grouped under "calling".

Applied to files:

  • workflow/rules/hla_typing.smk
🪛 GitHub Actions: Tests
workflow/Snakefile

[error] 1-1: Snakemake failed during execution. Command: snakemake --directory .test --snakefile workflow/Snakefile --show-failed-logs --conda-cleanup-pkgs cache --sdm conda --cores 10 --configfile .test/config-sra/config.yaml

workflow/rules/hla_typing.smk

[error] 1-1: RuleException: Input and output files have to be specified as strings or lists of strings.

🔇 Additional comments (8)
config/config.yaml (1)

81-82: Good integration of HLA typing feature flag.

The addition of the hla_typing.activate boolean flag follows the established pattern used throughout the config for feature toggles and provides a clean way to enable/disable HLA typing functionality.

workflow/Snakefile (1)

47-47: LGTM! Proper inclusion of HLA typing rules.

The addition of include: "rules/hla_typing.smk" follows the established pattern and correctly integrates the HLA typing functionality into the workflow.

workflow/rules/candidate_calling.smk (1)

75-75: Good refactoring to use centralized input selection.

The change from conditional input selection to using get_scatter_candidates_input centralizes the logic and supports the new HLA typing workflow. This is a clean refactoring that improves maintainability.

workflow/rules/common.smk (3)

713-723: Excellent pattern matching implementation for caller selection.

The use of structural pattern matching with explicit handling for "fusions", "variants", and the new "hla-variants" case is clean and follows Python best practices. The addition of "orthanq" as the caller for "hla-variants" aligns well with Orthanq being a statistical model for transparent and uncertainty-aware quantification of haplotypes used for HLA typing.


773-779: Well-designed helper function for candidate input selection.

The get_scatter_candidates_input function provides a clean abstraction for determining the appropriate input source based on the caller type and configuration. The logic correctly handles:

  • orthanq-specific candidate files
  • target region filtering
  • fallback to fixed candidate calls

This design supports the integration of HLA typing while maintaining backward compatibility.


166-172: HLA typing output integration verified. The expand call in workflow/rules/common.smk (results/hla-typing/{item.group}.{item.sample}.tsv) matches the orthanq_call rule output in workflow/rules/hla_typing.smk (results/hla-typing/{group}.{sample}.tsv).

workflow/rules/hla_typing.smk (2)

12-25: LGTM! Well-structured rule for gathering HLA variant calls.

The gather_annotated_calls rule correctly:

  • Uses the scatter-gather pattern with gather.calling()
  • Follows established patterns from other gather rules in the codebase
  • Uses the appropriate bcftools/concat wrapper
  • Includes proper logging and grouping

28-36: LGTM! Proper orthanq call rule structure.

The orthanq_call rule correctly:

  • Takes the gathered HLA variants as input
  • Uses sample-specific wildcards for personalized HLA typing
  • Outputs to the expected TSV format that matches the final output pattern
  • Includes proper logging

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
workflow/rules/hla_typing.smk (1)

1-14: Log both downloads and prefer HTTPS if available.

Currently only the second wget redirects stderr to log. Make logging consistent. If HTTPS endpoints exist for IMGT/HLA, prefer them.

Apply:

-    shell:
-        "wget -c {params.genes_link} -O {output.genes} && "
-        "wget -c {params.xml_link} -O {output.xml} 2> {log}"
+    shell:
+        "(wget -c {params.genes_link} -O {output.genes} 2> {log}) && "
+        "(wget -c {params.xml_link} -O {output.xml} 2>> {log})"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a8c8f6 and 2437682.

📒 Files selected for processing (4)
  • config/config.yaml (1 hunks)
  • workflow/envs/orthanq.yaml (1 hunks)
  • workflow/rules/common.smk (3 hunks)
  • workflow/rules/hla_typing.smk (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • workflow/envs/orthanq.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • config/config.yaml
🧰 Additional context used
🪛 GitHub Actions: Tests
workflow/rules/hla_typing.smk

[error] 35-35: KeyError in file "/home/runner/work/dna-seq-varlociraptor/dna-seq-varlociraptor/workflow/rules/hla_typing.smk", line 35: 'hla_typing'

🔇 Additional comments (2)
workflow/rules/hla_typing.smk (1)

64-84: Conda env reference check.

Ensure ../envs/orthanq.yaml exists and pins a compatible orthanq version; otherwise, Snakemake will fail at runtime.

workflow/rules/common.smk (1)

774-781: Fix get_scatter_candidates_input return type and orthanq path pattern

  • Call the inner function in the default branch: return get_fixed_candidate_calls("bcf")(wildcards).
  • For OrthAnQ, return a pattern matching the scatter wildcard:
 def get_scatter_candidates_input(wildcards):
     if wildcards.caller == "orthanq":
-        return "results/candidate-calls/orthanq.bcf"
+        return "results/candidate-calls/orthanq.{scatteritem}.bcf"
     elif config.get("target_regions"):
         return "results/candidate-calls/{group}.{caller}.filtered.bcf"
     else:
-        return get_fixed_candidate_calls("bcf")
+        return get_fixed_candidate_calls("bcf")(wildcards)

Verify that the scatter rule’s wildcard name (scatteritem) matches the actual scatter dimension (e.g. locus); adjust the placeholder if needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
workflow/rules/common.smk (1)

1008-1016: Add missing 'orthanq' to caller and 'hla-variants' to calling_type (workflow/rules/common.smk)

Apply:

 workflow/rules/common.smk
@@ -1008,7 +1008,7 @@ wildcard_constraints:
-    caller="|".join(["freebayes", "delly", "arriba"]),
+    caller="|".join(["freebayes", "delly", "arriba", "orthanq"]),
@@ -1014,7 +1014,7 @@ wildcard_constraints:
-    calling_type="|".join(["fusions", "variants"]),
+    calling_type="|".join(["fusions", "variants", "hla-variants"]),
🧹 Nitpick comments (1)
workflow/rules/common.smk (1)

175-258: Ensure correct group set for new calling_type “hla-variants”.

Blocks selecting group fallback to fusions_groups for any non-“variants” type. That will route “hla-variants” to the wrong groups. Treat “hla-variants” like “variants”.

Apply at each occurrence shown below:

-                            group=(
-                                variants_groups
-                                if calling_type == "variants"
-                                else fusions_groups
-                            ),
+                            group=(
+                                variants_groups
+                                if calling_type in ["variants", "hla-variants"]
+                                else fusions_groups
+                            ),

Repeat for the same pattern in lines 205-216, 218-231, 233-257, and 247-257.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2437682 and 3e22f3b.

📒 Files selected for processing (1)
  • workflow/rules/common.smk (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-25T12:34:09.018Z
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#338
File: workflow/rules/candidate_calling.smk:86-87
Timestamp: 2024-11-25T12:34:09.018Z
Learning: In this workflow, the grouping of variant calling rules is intentional, and not all related variant calling rules need to be grouped under "calling".

Applied to files:

  • workflow/rules/common.smk
🔇 Additional comments (1)
workflow/rules/common.smk (1)

714-725: Good mapping for “hla-variants” → “orthanq”, but update wildcard constraints.

The match/case looks fine. However, without adding “orthanq” to caller and “hla-variants” to calling_type in wildcard_constraints, scheduling will fail.

See concrete fix suggested on Lines 1008-1016.

…e for scattering orthanq candidate calls, extend wildcard constraints, configure input function of varlociraptor preprocess to retrieve orthanq candidate calls and fix final output
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
workflow/rules/hla_typing.smk (2)

1-13: Fix typo and capture all wget output in logs.

The comment has a typo and only the second wget command's stderr is captured.

-# dowloads latest IMGT/HLA database version
+# downloads latest IMGT/HLA database version
 rule get_hla_genes_and_xml:
     output:
         genes="results/preparation/hla_gen.fasta",
         xml="results/preparation/hla.xml.zip",
     log:
         "logs/get_hla_genes_and_xml.log",
     params:
         genes_link="ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla_gen.fasta",
         xml_link="ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/xml/hla.xml.zip",
     shell:
-        "wget -c {params.genes_link} -O {output.genes} && "
-        "wget -c {params.xml_link} -O {output.xml} 2> {log}"
+        "(wget -c {params.genes_link} -O {output.genes} && "
+        "wget -c {params.xml_link} -O {output.xml}) 2> {log}"

105-107: Address TODO items for complete HLA typing workflow.

The TODOs indicate missing functionality:

  1. Additional outputs (plots) are not generated
  2. Handling of bias deactivation in varlociraptor needs to be decided

Consider implementing these features or creating GitHub issues to track them.

Would you like me to help implement the missing plot generation or create issues to track these TODOs?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e22f3b and 82bc1b4.

📒 Files selected for processing (2)
  • workflow/rules/common.smk (8 hunks)
  • workflow/rules/hla_typing.smk (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/rules/common.smk
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-11-25T12:34:09.018Z
Learnt from: johanneskoester
PR: snakemake-workflows/dna-seq-varlociraptor#338
File: workflow/rules/candidate_calling.smk:86-87
Timestamp: 2024-11-25T12:34:09.018Z
Learning: In this workflow, the grouping of variant calling rules is intentional, and not all related variant calling rules need to be grouped under "calling".

Applied to files:

  • workflow/rules/hla_typing.smk
🪛 GitHub Actions: Tests
workflow/rules/hla_typing.smk

[error] 37-37: KeyError in hla_typing workflow rule: the key 'hla_typing' is not found in the expected dictionary. Snakemake exited with code 1 while evaluating the workflow (command: snakemake --directory .test --snakefile workflow/Snakefile --show-failed-logs --conda-cleanup-pkgs cache --sdm conda --cores 10 results/testcases/one/freebayes/IX:314200 --configfile .test/config-simple/config.yaml).

🔇 Additional comments (3)
workflow/rules/hla_typing.smk (3)

35-38: Fix KeyError by using safe config access.

The rule fails with KeyError when hla_typing config section is missing. Use safe dictionary access to handle missing configuration gracefully.

     output:
         bcfs=expand(
             "results/orthanq-candidate-calls/{caller}.hla-variants.bcf",
-            caller=[f"orthanq-{locus}" for locus in config["hla_typing"].get("loci")],
+            caller=[f"orthanq-{locus}" for locus in config.get("hla_typing", {}).get("loci", [])],
         ),

1-1: Add missing import for os module.

The os module is used in lines 24 and 44 but never imported.

Add this import at the top of the file:

+import os
+
 # dowloads latest IMGT/HLA database version

16-26: Redirect unzip stderr to log file.

The unzip command should capture stderr for debugging failed extractions.

     shell:
-        "unzip -o {input} -d {params.path_to_unzip}"
+        "unzip -o {input} -d {params.path_to_unzip} 2> {log}"

Based on learnings

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
workflow/rules/common.smk (4)

721-734: Reuse orthanq_callers for hla-variants to avoid re-parsing config.

Keeps behavior consistent and benefits from earlier normalization.

-            case "hla-variants":
-                caller = [
-                    f"orthanq-{locus}" for locus in config["hla_typing"].get("loci", [])
-                ]
+            case "hla-variants":
+                caller = orthanq_callers

783-791: Remove dead commented code block or add a note if intentionally kept.

The old get_scatter_candidates_input is commented out. Prefer deleting to reduce noise. If you plan to revive it, ensure the else branch calls get_fixed_candidate_calls("bcf")(wildcards) instead of returning the function object.


1526-1533: Clamp -T to a sane minimum to avoid negative/zero thresholds.

If min_primer_len is very small, min_primer_len-2 can go below 0; clamp to ≥2.

-    if min_primer_len < 32:
-        extra += f" -T {min_primer_len-2}"
+    if min_primer_len < 32:
+        extra += f" -T {max(2, min_primer_len - 2)}"

792-802: Use exact membership check against orthanq_callers
Replace the prefix-based check with a membership test to avoid false positives:

-    if is_activated("hla_typing") and wc.caller.startswith("orthanq-"):
+    if is_activated("hla_typing") and wc.caller in orthanq_callers:
         return "results/orthanq-candidate-calls/{caller}.hla-variants.{scatteritem}.bcf"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82bc1b4 and ee1cbbf.

📒 Files selected for processing (2)
  • workflow/rules/common.smk (8 hunks)
  • workflow/rules/hla_typing.smk (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/rules/hla_typing.smk
🔇 Additional comments (4)
workflow/rules/common.smk (4)

995-998: LGTM: candidate filter aux files list comprehension is correct.


1006-1009: LGTM: candidate filter aux args zip aligns with aux files.

Ensure get_filter_aux_entries("candidates") and input.aux have equal length in the rule wiring.


1025-1033: Add 'hla-variants' to wildcard_constraints.calling_type.

Without this, rules using calling_type="hla-variants" will violate constraints and fail. This mirrors the new branch in get_scattered_calls.

 wildcard_constraints:
@@
-    calling_type="|".join(["fusions", "variants"]),
+    calling_type="|".join(["fusions", "variants", "hla-variants"]),

Based on learnings


170-181: Use the normalized orthanq_callers and guard missing loci.

This repeats the raw config access and will break if loci is missing or a string. Reuse orthanq_callers.

-                locus=[
-                    f"orthanq-{locus}" for locus in config["hla_typing"].get("loci")
-                ],
+                locus=orthanq_callers,

Based on learnings

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.test/config_primers/config.yaml (1)

41-47: LGTM. The hla_typing block is well-structured and correctly disabled (activate: false) in this primers-only test config; no existing .test/ config enables HLA typing—consider adding a dedicated test config with hla_typing.activate: true to validate the new feature.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee1cbbf and 0ea01ce.

📒 Files selected for processing (1)
  • .test/config_primers/config.yaml (1 hunks)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
workflow/rules/common.smk (1)

1032-1032: Add "hla-variants" to calling_type wildcard constraint.

Line 721-733 handle calling_type="hla-variants" in the match/case block, but line 1032 still only allows "fusions" and "variants". This will cause constraint errors.

Apply this diff:

-    calling_type="|".join(["fusions", "variants"]),
+    calling_type="|".join(["fusions", "variants", "hla-variants"]),
♻️ Duplicate comments (5)
workflow/rules/hla_typing.smk (3)

26-26: Redirect stderr to log file.

The unzip command should redirect its output to the log file for proper error tracking.

Apply this diff:

-        "unzip -o {input} -d {params.path_to_unzip}"
+        "unzip -o {input} -d {params.path_to_unzip} 2> {log}"

35-38: Safeguard against missing hla_typing config.

Line 37 accesses config["hla_typing"] directly, which will raise a KeyError if the section is missing or HLA typing is deactivated. Use defensive access.

Apply this diff:

         bcfs=expand(
             "results/orthanq-candidate-calls/{caller}.hla-variants.bcf",
-            caller=[f"orthanq-{locus}" for locus in config["hla_typing"].get("loci")],
+            caller=[f"orthanq-{locus}" for locus in config.get("hla_typing", {}).get("loci", [])],
         ),

1-103: Add missing os import at the top of the file.

Line 24 uses os.path.dirname and line 44 uses os.path.dirname, but the os module is not imported. This will cause a NameError at runtime.

Add this import at the top of the file:

+import os
+
 # dowloads latest IMGT/HLA database version
 rule get_hla_genes_and_xml:
workflow/rules/common.smk (2)

47-50: Normalize hla_loci to prevent character-wise iteration.

If config["hla_typing"]["loci"] is a string rather than a list, the list comprehension on line 49 will iterate over individual characters, producing callers like orthanq-A, orthanq-B, etc. (one per character) instead of orthanq-HLA-A, etc.

Based on learnings

Apply this diff to normalize and deduplicate:

 # hla typing
-hla_loci = config.get("hla_typing", {}).get("loci", [])
-orthanq_callers = [f"orthanq-{locus}" for locus in hla_loci]
+_raw_loci = config.get("hla_typing", {}).get("loci", []) or []
+hla_loci = [_raw_loci] if isinstance(_raw_loci, str) else list(_raw_loci)
+# ensure stable order and uniqueness
+hla_loci = list(dict.fromkeys(hla_loci))
+orthanq_callers = [f"orthanq-{locus}" for locus in hla_loci]

170-181: Safeguard loci lookup and normalize before passing to expand.

If config["hla_typing"]["loci"] is missing, None, or a string, the expand call will fail. Fetch and normalize loci before using them.

Based on learnings

Apply this diff:

     if is_activated("hla_typing"):
-        loci = [f"orthanq-{locus}" for locus in config["hla_typing"].get("loci", [])]
+        _raw_loci = config.get("hla_typing", {}).get("loci", []) or []
+        loci_list = [_raw_loci] if isinstance(_raw_loci, str) else list(_raw_loci)
+        loci = [f"orthanq-{locus}" for locus in loci_list]
         for group in samples["group"].unique():
             final_output.extend(
                 expand(
                     "results/hla-typing/{group}-{locus}/{alias}/{alias}-{locus}.csv",
                     alias=get_group_aliases(group),
                     group=group,
                     locus=loci,
                 )
             )
🧹 Nitpick comments (1)
workflow/rules/hla_typing.smk (1)

105-107: Address TODO comments before merging.

The TODO comments indicate missing functionality (plots, inputs, commands) and unresolved design decisions about handling varlociraptor biases. Consider opening issues to track these items.

Would you like me to help generate tracking issues for these TODOs?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d29f552 and de3abe1.

📒 Files selected for processing (2)
  • workflow/rules/common.smk (8 hunks)
  • workflow/rules/hla_typing.smk (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test no candidate filtering
  • GitHub Check: test testcase generation
  • GitHub Check: test primers
  • GitHub Check: test sra download
  • GitHub Check: test local input
  • GitHub Check: test target regions
🔇 Additional comments (10)
workflow/rules/common.smk (5)

792-801: LGTM on get_candidate_calls orthanq routing.

The conditional correctly routes orthanq callers to their specific candidate path when HLA typing is activated and the caller starts with "orthanq-".


721-734: LGTM on get_scattered_calls match/case structure.

The match/case block properly handles the three calling types ("fusions", "variants", "hla-variants") and raises a clear error for unexpected values. The orthanq caller list is correctly constructed from configured loci.


995-997: Minor formatting change is acceptable.

The list comprehension reformatting improves readability without changing logic.


1006-1009: Minor formatting change is acceptable.

The zip call reformatting improves readability without changing logic.


1530-1530: Minor spacing change is acceptable.

The spacing adjustment in the arithmetic expression min_primer_len-2 to min_primer_len - 2 improves readability.

workflow/rules/hla_typing.smk (5)

2-13: LGTM on get_hla_genes_and_xml rule.

The rule correctly downloads the latest IMGT/HLA database files using wget with proper output paths and logging.


29-48: LGTM on orthanq_candidate_variants command structure.

The orthanq candidates command is properly structured with all required inputs and parameters. The multi-threaded execution and BCF output format are appropriately configured.


52-64: LGTM on scatter_candidates_orthanq rule.

The rule correctly uses rbt vcf-split to scatter candidate BCFs per locus without requiring group wildcards.


67-80: LGTM on gather_annotated_calls_orthanq rule.

The gather pattern now correctly aligns with the common scheme using results/calls/{{group}}.{{caller}}.{scatteritem}.bcf instead of including the locus wildcard. This matches the expected pattern from get_scattered_calls in common.smk.


83-102: LGTM on orthanq_call rule.

The rule correctly invokes orthanq call hla with all required inputs and produces multiple output files. The wildcard usage is now correct (no typo in haplotype_calls path).

@dlaehnemann
Copy link
Member

Just a meta-comment without having looked into the details: Wouldn't we maybe want to make this a little workflow on its own and only include this as a module here? Just to avoid further complicating this workflow's tool dependencies?

@huzuner
Copy link
Contributor

huzuner commented Oct 10, 2025

Initially, yes, that was the main idea but now we mainly want to skip repeating both the mapping and Varlociraptor steps.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
workflow/rules/hla_typing.smk (1)

16-26: Add missing os import and redirect stderr to log.

This rule uses os.path.dirname without importing os at the top of the file, which will cause a NameError. Additionally, the unzip command should redirect its output to the log.

Apply this diff at the top of the file:

+import os
+
 # dowloads latest IMGT/HLA database version
 rule get_hla_genes_and_xml:

And update the shell command:

-        "unzip -o {input} -d {params.path_to_unzip}"
+        "unzip -o {input} -d {params.path_to_unzip} 2> {log}"

Based on learnings

workflow/rules/common.smk (1)

1028-1028: Include “hla-variants” in calling_type wildcard constraints
In workflow/rules/common.smk (around line 1032), the calling_type pattern currently joins ["fusions","variants"]; add "hla-variants" to this list.

🧹 Nitpick comments (4)
workflow/rules/common.smk (1)

783-790: Remove or uncomment dead code.

This commented function appears to be an earlier implementation of scatter candidate input routing. Either uncomment and integrate it if needed, or remove it to keep the codebase clean.

workflow/rules/hla_typing.smk (3)

47-55: Consider using symlink instead of copy for efficiency.

Using cp creates a full copy of the BCF file. If these files are large, consider using a symbolic link instead.

-        "cp {input}/{wildcards.locus}.bcf {output}"
+        "ln -sf $(realpath {input}/{wildcards.locus}.bcf) {output}"

This saves disk space and is faster, though it requires the temp directory to exist until all downstream rules complete.


58-71: Add index output for downstream compatibility.

BCF files typically need index files (.csi) for efficient access. The scatter operation should produce both the BCF and its index.

 rule scatter_candidates_orthanq:
     input:
         "results/orthanq-candidate-calls/{orthanq_locus}.hla-variants.bcf",
     output:
         scatter.calling(
             "results/orthanq-candidate-calls/{{orthanq_locus}}.hla-variants.{scatteritem}.bcf"
         ),
+        scatter.calling(
+            "results/orthanq-candidate-calls/{{orthanq_locus}}.hla-variants.{scatteritem}.bcf.csi"
+        ),

However, verify if rbt vcf-split automatically creates indexes.


107-109: Address TODO comments before merging.

The TODO comments indicate incomplete implementation. Consider whether these should be addressed in this PR or tracked in a separate issue.

Do you want me to open a GitHub issue to track these TODO items, or should they be addressed in this PR?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de3abe1 and 927af17.

📒 Files selected for processing (2)
  • workflow/rules/common.smk (8 hunks)
  • workflow/rules/hla_typing.smk (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: test sra download
  • GitHub Check: test testcase generation
  • GitHub Check: test target regions
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test local input
  • GitHub Check: test primers
  • GitHub Check: test no candidate filtering
🔇 Additional comments (7)
workflow/rules/common.smk (5)

792-801: Approve orthanq routing with minor observation.

The logic correctly routes orthanq callers to a different candidate path. The startswith("orthanq-") check is appropriate given the naming convention.

One minor note: The function returns early for orthanq but the subsequent if at line 798 doesn't have an elif, which is fine but slightly less explicit about mutual exclusivity.


995-997: Formatting change looks good.

The list comprehension is more readable on multiple lines when it gets long.


1006-1009: Formatting change looks good.

Consistent with the change at lines 995-997.


1530-1530: Formatting change looks good.

The space removal around the subtraction doesn't affect correctness and follows Python style.


721-734: LGTM but validate config['hla_typing'].loci
Ensure config["hla_typing"].get("loci") returns a list of strings (not a single string) to avoid iterating over characters when building HLA variant callers.

workflow/rules/hla_typing.smk (2)

74-87: LGTM!

The gather rule correctly uses the gather helper and includes both BCF and CSI inputs, which is good practice for BCF operations.


90-106: Verify orthanq wrapper version and directory output handling
The v7.6.2/bio/orthanq wrapper tag wasn't found—confirm the tag exists in the snakemake-wrappers repo and that this wrapper supports directory outputs.

huzuner and others added 4 commits October 10, 2025 23:25
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
workflow/rules/trimming.smk (1)

8-8: Wrapper bump is valid, no breaking changes detected.
The v7.7.0 tag exists and the bio/sra-tools/fasterq-dump wrapper remains compatible. Since this update seems unrelated to the HLA typing integration, consider moving it into a standalone maintenance PR.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ecbf499 and 6ee2287.

📒 Files selected for processing (1)
  • workflow/rules/trimming.smk (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: test no candidate filtering
  • GitHub Check: test local input
  • GitHub Check: test testcase generation
  • GitHub Check: test target regions, multiple BEDs
  • GitHub Check: test primers
  • GitHub Check: test target regions
  • GitHub Check: test sra download

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants