Skip to content

Conversation

@trishorts
Copy link
Contributor

@trishorts trishorts commented Oct 24, 2025

Minimal change. Kept old constructors for SequenceVariation but added a couple new ones for VariantCallFormat. Updated several unit tests with additional data but no changes to any Asserts. That means that all tests are still returning exactly the same data, but that there is more of it. This was done b/c lots of trouble debugging was foreshadowed.

@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 70.56075% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.89%. Comparing base (57f1327) to head (193bd67).

Files with missing lines Patch % Lines
mzLib/Omics/BioPolymer/VariantApplication.cs 61.03% 20 Missing and 10 partials ⚠️
mzLib/Omics/BioPolymer/SequenceVariation.cs 51.72% 11 Missing and 3 partials ⚠️
mzLib/UsefulProteomicsDatabases/ProteinXmlEntry.cs 88.54% 7 Missing and 4 partials ⚠️
...Databases/DecoyGeneration/DecoyProteinGenerator.cs 12.50% 0 Missing and 7 partials ⚠️
...micsDatabases/DecoyGeneration/RnaDecoyGenerator.cs 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #967      +/-   ##
==========================================
- Coverage   81.00%   80.89%   -0.11%     
==========================================
  Files         269      269              
  Lines       38826    38910      +84     
  Branches     4241     4260      +19     
==========================================
+ Hits        31450    31478      +28     
- Misses       6640     6681      +41     
- Partials      736      751      +15     
Files with missing lines Coverage Δ
mzLib/Omics/BioPolymer/VariantCallFormat.cs 86.95% <ø> (ø)
mzLib/UsefulProteomicsDatabases/ProteinDbWriter.cs 95.48% <100.00%> (ø)
...micsDatabases/DecoyGeneration/RnaDecoyGenerator.cs 83.72% <50.00%> (ø)
...Databases/DecoyGeneration/DecoyProteinGenerator.cs 90.79% <12.50%> (ø)
mzLib/UsefulProteomicsDatabases/ProteinXmlEntry.cs 96.78% <88.54%> (-2.75%) ⬇️
mzLib/Omics/BioPolymer/SequenceVariation.cs 71.25% <51.72%> (-14.72%) ⬇️
mzLib/Omics/BioPolymer/VariantApplication.cs 78.94% <61.03%> (-2.83%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trishorts trishorts changed the title Add variant call format attemp two Update Handling of Sequence Variants (Part 4): Add VariantCallFormat to SequenceVariation Oct 24, 2025
@trishorts trishorts requested a review from Copilot October 24, 2025 21:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors SequenceVariation to use a dedicated VariantCallFormat property instead of overloading the Description field, improving type safety and data encapsulation. The key changes include:

  • Addition of VariantCallFormatDataString property to SequenceVariation
  • New constructor overloads to support VCF data
  • Enhanced isoform filtering logic to prevent cross-sequence variant application
  • Addition of sequence variant pruning to handle out-of-range coordinates

Reviewed Changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ProteinXmlEntry.cs Adds isoform validation for variants, sequence variant pruning, and improves documentation
ProteinDbWriter.cs Updates to write VCF data instead of plain description
ProteinDbLoader.cs Adds nucleotide substitution conversion step during protein loading
RnaDecoyGenerator.cs Updates references from Description to VariantCallFormatDataString
DecoyProteinGenerator.cs Updates references from Description to VariantCallFormatDataString
TestDbLoader.cs Removes extraneous whitespace
TestPeptideWithSetMods.cs Significantly expands test coverage for truncation products with structured validation
Test.csproj Adds expected data file for truncation tests
TestIsoTracker.cs Updates comment style from "Description:" to "VariantCallFormatDataString:"
truncationsExpected.tsv New baseline file containing expected truncation test results
VariantTests.cs Adds comprehensive tests for variant parsing and nucleotide substitution conversion
TestProteomicsReadWrite.cs Updates assertions to use VariantCallFormatDataString
TestProteinReader.cs Updates assertions to use VariantCallFormatDataString
VariantCallFormat.cs Minor documentation correction
VariantApplication.cs Extensive null-safety improvements and refactoring to support new VCF property
SequenceVariation.cs Adds VariantCallFormatDataString property and new constructor overloads

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

[Test]
public static void TestPeptideWithSetModsReturnsDecoyTruncationsInTopDown()
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This test method's purpose and behavior are now significantly different from its original implementation. The method name TestPeptideWithSetModsReturnsDecoyTruncationsInTopDown doesn't accurately reflect the new comprehensive table comparison logic. Consider renaming to something like TestTopDownTruncationTableMatchesExpected or ValidateTopDownTruncationsAgainstBaseline.

Suggested change
public static void TestPeptideWithSetModsReturnsDecoyTruncationsInTopDown()
public static void TestTopDownTruncationTableMatchesExpected()

Copilot uses AI. Check for mistakes.
}
// this is a protein with only VCF lines
return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinatorics, minAlleleDepth);
return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinitorics: maxAllowedVariantsForCombinatorics, minAlleleDepth);
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name 'maxAllowedVariantsForCombinitorics' contains a typo - 'Combinitorics' should be 'Combinatorics'. While this matches the parameter name in the method signature, both should be corrected for consistency.

Suggested change
return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinitorics: maxAllowedVariantsForCombinatorics, minAlleleDepth);
return ApplyVariants(protein, protein.SequenceVariations, maxAllowedVariantsForCombinatorics: maxAllowedVariantsForCombinatorics, minAlleleDepth);

Copilot uses AI. Check for mistakes.
public static string CombineDescriptions(IEnumerable<SequenceVariation>? variations)
{
return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.Description));
return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.VariantCallFormatDataString));
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is joining VariantCallFormat objects directly, which will call their ToString() method. The original code used variations.Select(d => d.Description), selecting the description string. This should likely be variations.Select(d => d.VariantCallFormatDataString?.Description ?? d.Description) to extract the description string from the VCF object.

Suggested change
return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.VariantCallFormatDataString));
return variations.IsNullOrEmpty() ? "" : string.Join(", variant:", variations.Select(d => d.VariantCallFormatDataString?.Description ?? d.Description));

Copilot uses AI. Check for mistakes.
Comment on lines 114 to 117
private static bool LooksLikeVcf(string s)
=> !string.IsNullOrWhiteSpace(s)
&& (s.Contains("\t") || s.Contains("\\t"))
&& (s.Contains("GT:") || s.Contains(":GT:") || s.Contains(" ANN=") || s.Contains("\tANN="));
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LooksLikeVcf method is defined but never used in this file or elsewhere in the diff. If this is intended for future use, consider adding a comment explaining its purpose. Otherwise, it should be removed to avoid dead code.

Suggested change
private static bool LooksLikeVcf(string s)
=> !string.IsNullOrWhiteSpace(s)
&& (s.Contains("\t") || s.Contains("\\t"))
&& (s.Contains("GT:") || s.Contains(":GT:") || s.Contains(" ANN=") || s.Contains("\tANN="));

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant