-
Notifications
You must be signed in to change notification settings - Fork 3
feat: swap to bwa-mem2 for read-alignment #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
""" WalkthroughThis update replaces the existing BWA version 1.8.0 indexing and alignment rules with new rules using BWA-MEM2 version 6.0.1. The new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Workflow
participant BWA-MEM2 Index
participant BWA-MEM2 Mem
User->>Workflow: Start workflow
Workflow->>BWA-MEM2 Index: Index reference genome
BWA-MEM2 Index-->>Workflow: Output index files
Workflow->>BWA-MEM2 Mem: Align reads with BWA-MEM2
BWA-MEM2 Mem-->>Workflow: Output .mem2.bam file
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
workflow/rules/download.smk (1)
227-241
: Consider adding benchmark information.BWA-MEM2 is generally faster than regular BWA. Consider documenting the expected performance improvements or running benchmarks to quantify the benefits of this change.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/rules/download.smk
(1 hunks)
🔇 Additional comments (5)
workflow/rules/download.smk (5)
181-207
: Commented-out BWA rules look appropriate.The old BWA index and alignment rules have been commented out rather than removed, which preserves the history and reference implementation. This is a good practice for significant tool changes.
209-225
: BWA-MEM2 index rule correctly configured.The index file extensions have been properly updated to match BWA-MEM2's requirements, including the addition of
.0123
and.bwt.2bit.64
files. The wrapper version has been updated to v6.0.1, which is appropriate for this new tool.
227-241
: BWA-MEM2 mem rule is well-structured.The alignment rule has been properly updated to use the BWA-MEM2 tool with appropriate input and output paths. The parameter name change from
sorting
tosort
appears to be consistent with the new wrapper's expectations, and the sort_order documentation has been improved.
245-245
: Mark duplicates input path correctly updated.The input path for the mark_duplicates rule has been properly updated to use the new
.mem2.bam
files produced by the BWA-MEM2 rule.
227-241
:✅ Verification successful
Verify downstream rules compatibility with new file naming.
The file naming convention has changed from
.bam
to.mem2.bam
. While the mark_duplicates rule has been updated, it's important to verify that all downstream rules or scripts that might expect the previous naming format have been accounted for.
🏁 Script executed:
#!/bin/bash # Check for any other rules or scripts that might reference the old .bam naming convention rg "\\.bam" --type snakemake | grep -v "mem2.bam" | grep -v "#"Length of output: 96
To catch references in your Snakemake files, let’s search all “.smk” files (rather than using an unsupported “snakemake” file-type in ripgrep):
🏁 Script executed:
#!/bin/bash # Look for any .bam references in all Snakemake (.smk) files, excluding our new .mem2.bam rg "\.bam" -g "*.smk" | grep -v "mem2.bam" | grep -v "#"Length of output: 484
All downstream Snakemake rules already updated—no stale “.bam” references found
A search across all “*.smk” files (excluding our new “.mem2.bam”) only returned the expected downstream references to “.dedup.bam” and “.dedup.bam.bai” (inworkflow/rules/download.smk
andworkflow/rules/common.smk
). No other rules or scripts were still pointing at the old “.bam” naming for the primary alignments. You can consider this verified.
bwa-mem2 is known to need a lot of memory both for the index and the mem step, so we decided to leave this open for now and come back to this later when we have a solution for the memory consumption! |
Summary by CodeRabbit