-
Notifications
You must be signed in to change notification settings - Fork 38
Update Handling of Sequence Variants (Part 4): Add VariantCallFormat to SequenceVariation #967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Update Handling of Sequence Variants (Part 4): Add VariantCallFormat to SequenceVariation #967
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #967 +/- ##
==========================================
- Coverage 81.00% 80.89% -0.11%
==========================================
Files 269 269
Lines 38826 38910 +84
Branches 4241 4260 +19
==========================================
+ Hits 31450 31478 +28
- Misses 6640 6681 +41
- Partials 736 751 +15
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors SequenceVariation to use a dedicated VariantCallFormat property instead of overloading the Description field, improving type safety and data encapsulation. The key changes include:
- Addition of
VariantCallFormatDataStringproperty toSequenceVariation - New constructor overloads to support VCF data
- Enhanced isoform filtering logic to prevent cross-sequence variant application
- Addition of sequence variant pruning to handle out-of-range coordinates
Reviewed Changes
Copilot reviewed 16 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| ProteinXmlEntry.cs | Adds isoform validation for variants, sequence variant pruning, and improves documentation |
| ProteinDbWriter.cs | Updates to write VCF data instead of plain description |
| ProteinDbLoader.cs | Adds nucleotide substitution conversion step during protein loading |
| RnaDecoyGenerator.cs | Updates references from Description to VariantCallFormatDataString |
| DecoyProteinGenerator.cs | Updates references from Description to VariantCallFormatDataString |
| TestDbLoader.cs | Removes extraneous whitespace |
| TestPeptideWithSetMods.cs | Significantly expands test coverage for truncation products with structured validation |
| Test.csproj | Adds expected data file for truncation tests |
| TestIsoTracker.cs | Updates comment style from "Description:" to "VariantCallFormatDataString:" |
| truncationsExpected.tsv | New baseline file containing expected truncation test results |
| VariantTests.cs | Adds comprehensive tests for variant parsing and nucleotide substitution conversion |
| TestProteomicsReadWrite.cs | Updates assertions to use VariantCallFormatDataString |
| TestProteinReader.cs | Updates assertions to use VariantCallFormatDataString |
| VariantCallFormat.cs | Minor documentation correction |
| VariantApplication.cs | Extensive null-safety improvements and refactoring to support new VCF property |
| SequenceVariation.cs | Adds VariantCallFormatDataString property and new constructor overloads |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| [Test] | ||
| public static void TestPeptideWithSetModsReturnsDecoyTruncationsInTopDown() |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] This test method's purpose and behavior are now significantly different from its original implementation. The method name TestPeptideWithSetModsReturnsDecoyTruncationsInTopDown doesn't accurately reflect the new comprehensive table comparison logic. Consider renaming to something like TestTopDownTruncationTableMatchesExpected or ValidateTopDownTruncationsAgainstBaseline.
| public static void TestPeptideWithSetModsReturnsDecoyTruncationsInTopDown() | |
| public static void TestTopDownTruncationTableMatchesExpected() |
| } | ||
| // this is a protein with only VCF lines | ||
| return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinatorics, minAlleleDepth); | ||
| return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinitorics: maxAllowedVariantsForCombinatorics, minAlleleDepth); |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter name 'maxAllowedVariantsForCombinitorics' contains a typo - 'Combinitorics' should be 'Combinatorics'. While this matches the parameter name in the method signature, both should be corrected for consistency.
| return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinitorics: maxAllowedVariantsForCombinatorics, minAlleleDepth); | |
| return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinatorics: maxAllowedVariantsForCombinatorics, minAlleleDepth); |
| public static string CombineDescriptions(IEnumerable<SequenceVariation>? variations) | ||
| { | ||
| return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.Description)); | ||
| return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.VariantCallFormatDataString)); |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is joining VariantCallFormat objects directly, which will call their ToString() method. The original code used variations.Select(d => d.Description), selecting the description string. This should likely be variations.Select(d => d.VariantCallFormatDataString?.Description ?? d.Description) to extract the description string from the VCF object.
| return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.VariantCallFormatDataString)); | |
| return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.VariantCallFormatDataString?.Description ?? d.Description)); |
| private static bool LooksLikeVcf(string s) | ||
| => !string.IsNullOrWhiteSpace(s) | ||
| && (s.Contains("\t") || s.Contains("\\t")) | ||
| && (s.Contains("GT:") || s.Contains(":GT:") || s.Contains(" ANN=") || s.Contains("\tANN=")); |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LooksLikeVcf method is defined but never used in this file or elsewhere in the diff. If this is intended for future use, consider adding a comment explaining its purpose. Otherwise, it should be removed to avoid dead code.
| private static bool LooksLikeVcf(string s) | |
| => !string.IsNullOrWhiteSpace(s) | |
| && (s.Contains("\t") || s.Contains("\\t")) | |
| && (s.Contains("GT:") || s.Contains(":GT:") || s.Contains(" ANN=") || s.Contains("\tANN=")); |
Minimal change. Kept old constructors for SequenceVariation but added a couple new ones for VariantCallFormat. Updated several unit tests with additional data but no changes to any Asserts. That means that all tests are still returning exactly the same data, but that there is more of it. This was done b/c lots of trouble debugging was foreshadowed.