Skip to content

Conversation

@trishorts
Copy link
Contributor

@trishorts trishorts commented Sep 17, 2025

Overview of Major Changes

  1. SequenceVariantDescription.cs renamed VariantCallFormat.cs – this was done for accuracy. All of the parameters of the old class actually map onto VCF file format. This class was not meant to contain generic data about sequence variants. So renaming improves expectations.
  2. VariantCallFormat.cs extended in updated. All of the information about reading VCF files was done several years ago and unsurprisingly there have been updates to the information provided. I examined what is presently reported in VCF files and enabled the newly named method to be able to retrieve all of that information accurately. This revised class should now be able to handle most anything that we throw at it. Depth and Zygosity are key parameters. We use depth to filter which variants get applied. And Zygosity tells us if we have a complete or partial change when applying the variant.
  3. SnpEffAnnotation revisions. Similar story to VCF. Revisions were mostly to better handle string splitting. SnpEff appears out of favor. I don’t expect lots of use. We are keeping this for backwards compatibility. The large block of commented code is for historical reference. It was and is not used. But, it does explain some things. Comments explaining the comments are added.
  4. SequenceVariation revisions. One stop to handle variants from GPTMD, UniProt, VCF. There are a couple of versions because information passed can be different. Sometimes there is only a begin position and sometimes begin and end. We convert everything to the latter. Sequence variants have a modification dictionary because sequence variants can have modifications. In the case where we have a point mutation, it might seem like overkill. But we could have two different mods at the same point. In other cases where we have multi-amino acid change, the modification can be somewhere along the string. Keep a list of mods and positions helps. More later in VariantApplication. Description sometimes comes from VCF and sometimes it comes from UniProt. There is some difference in handling of this parameter. AreValid was a key method as not everything sent in is actually valid. Go figure. Split multi-sample VCF metadata was super confusing to me. But, it turns out that not every line in a VCF describes a single observation. In fact you could have a 100 different samples on the same vcf line. Each would have their own read depth and zygosity. The code here handles the variety of samples thrown at it.
  5. VariantApplication – It was not straightforward deciding how to combine UniProt variants and VCF variants because VCF has depths and zygosity and uniprot does not. Also, counts were bugged. There was no way to specify how many variants per isoform and how many total isoforms. This had to be redone. And many old unit tests were effect by this change. A key in generating combinations of isoforms was making sure that there were no variant collisions. This happens routinely when you have two variants attempting to change the same sequence or sequence that has totally disappeared once a big truncation is applied. This is new code. There are a couple key aspects of applying variants. One, the sequence length can change, which changes all modification coordinates. Two, and modifications on a sequence variant need to get promoted to the variant protein modification dictionary so they are in the correct spot for digestion or not, etc. Then, the applied sequence variant needs to be removed from the list of sequence variants so its not hanging around like a sausage. Formerly, one only got variants from this method. Now it returns the consensus too. Still working to resolve best handling of this.
  6. PeptideWithSetModifications. There were some big changes in the IntersectsAndIdentifiesVariation method. This method is replete with edge cases and was formerly bugged. Revised so all the old unit tests pass and better handling of edge cases. Many more comments.
  7. DecoyProteinGenerator. Manual addition of nics write decoys method. This can be complicated with variants. When reversing a sequence one also needs to adjust all the coordinates of the sequence variants and their modifications. An extra complication is when the variant is at the start and we don’t reverse the start.
  8. RNAdecoygenerator see decoyproteingenerator.
  9. Proteindbloader has new parameters for LoadProteinXML that indicate the desired number of variants per isoform, required minimum sequencing depth and total combinatorial number of isoforms desired. Method converts and variants discovered as modifications in GPTMD to actual sequence variants. All constructors are temporarily retained that convert maxHeterozygousVariants to maxVariantIsoforms etc.
  10. ProteinDbWriter. Fasta change b/c write decoys changes.

Core Additions (SequenceVariation)
• New semantic / convenience properties:
• SearchableAnnotation (VCF line preferred over Description).
• ReferenceAllele / AlternateAllele passthrough to VariantCallFormat.
• Classification flags: IsPointSubstitution, IsMultiResidueSubstitution, IsInsertion, IsDeletion, IsStopGain, IsLikelyFrameshift.
• LegacyVariantDescription (Obsolete shim).
• Genotype expansion:
• SplitPerGenotype(...) creates per‑sample variants with depth filtering, heterozygous handling, and optional reference emission (suppresses invalid no‑ops).
• Combination utilities:
• CombineEquivalent(...) merges equivalent variants (merges modifications, aggregates descriptions).
• Modification management:
• TryAddModification / AddModifications with positional validation against edited span, stop gains, deletions.
• Validation of OneBasedModifications during construction (throws on invalid positions).
• Enhanced validation logic:
• AreValid() now rejects “no‑op” variants (OriginalSequence == VariantSequence and no variant‑specific modifications).
• Frameshift heuristic (length change without insertion/deletion classification and not a stop gain).
• Equality / hashing:
• Description intentionally excluded from Equals / GetHashCode.
• Modification dictionaries compared as position + multiset of modification identifiers.
• SimpleString() retained; spans use Begin-End form.
• Robust constructor overload accepting parsed VariantCallFormat or raw VCF string.
Core Additions (VariantCallFormat)
• Lightweight VCF parser with:
• REF / ALT allele capture.
• ANN (SnpEff) extraction via SnpEffAnnotation (assumed existing).
• FORMAT token parsing into per‑sample dictionaries.
• GT parsing (handles '/', '|' separators; preserves '.' as missing).
• AD parsing with validation + numeric extraction (TryParseAD).
• AlleleIndex resolution against ANN’s allele.
• Zygosity classification per sample (Unknown / Homozygous / Heterozygous).
• Legacy boolean maps (Homozygous / Heterozygous) retained.
• Truncated line tolerance (IsTruncated).
• Validation helpers for restricted vocabularies.
Core Changes / Refactors (VariantApplication)
• GetVariantBioPolymers:
• Early exits clarified; over‑strict validation fallback (uses raw variants if all fail AreValid()).
• Uses new ApplyAllVariantCombinations pipeline.
• ApplyAllVariantCombinations:
• Genotype-aware expansion via SplitPerGenotype before combination enumeration.
• Collapses equivalent variants first (CombineEquivalent).
• Filters no‑ops aggressively.
• Deterministic combination generation; guards for invalid / overlapping variants (ValidCombination).
• Improved safety / resilience:
• Widespread try/catch wrappers prevent a single malformed variant from collapsing processing.
• SafeAreValid helper.
• ApplySingleVariant:
• Rebuilds SequenceVariation instance for applied result (keeping original span semantics).
• Adjusts truncation products, modifications, and subsequent variant coordinates.
• Handles premature stops (sequence truncation) and length recalculations (mass & length attribute normalization via reflection).
• Index adjustment helpers:
• AdjustSequenceVariationIndices: careful overlap & same VCF record handling.
• AdjustTruncationProductIndices: stop-gain aware length clipping.
• AdjustModificationIndices: preserves or shifts mods, avoids out-of-range issues.
• Name/accession formatting:
• GetVariantName aggregates de‑duplicated descriptors (VCF preferred).
• CombineDescriptions / CombineSimpleStrings centralize formatting.
• Nucleotide substitution mod conversion:
• ConvertNucleotideSubstitutionModificationsToSequenceVariants converts “nucleotide substitution” modifications into SequenceVariation entries, pruning the original modifications.
• Data sanitation:
• SanitizeVariantData(...) drops invalid variants, prunes impossible modification coordinates, reconciles applied variant list, yields human-readable notes.
Behavioral / Validation Shifts (Potential Breaking Changes)
• “No‑op” variants (no sequence change and no variant-specific modifications) now invalid (throw in constructor or AreValid() returns false). Existing workflows that stored reference-only variants must adapt (e.g., attach a modification or skip creation).
• Equality semantics changed (Description ignored, modifications included). Hash-based collections may re-cluster.
• Modification coordinate validation is stricter (invalid positions now throw on construction rather than being silently kept).
• SplitPerGenotype filters out heterozygous reference “ref copy” when it would be a no‑op (as asserted in tests).
• Frameshift heuristic might classify certain previous substitutions differently (downstream logic relying on old classification may need review).
New Test Coverage (SequenceVariationNewPropertiesTests)
• Verifies SearchableAnnotation precedence (VCF over description).
• Checks allele passthrough, classification predicates, stop-gain vs frameshift disambiguation.
• Ensures constructor rejects no‑ops unless variant-specific modifications exist.
• Validates modification position enforcement & deletion + modification rejection.
• Genotype splitting logic (only alt entries produced when ref copy invalid).
• CombineEquivalent merges descriptions and modifications.
• Equality semantics ignoring Description.
• Convenience constructor end coordinate logic.
• SimpleString formatting rules.
• LegacyVariantDescription continuity.
Reliability / Robustness Improvements
• Defensive cloning of modification lists when combining or splitting variants.
• Frequency-based multiset comparison for modifications (stable deterministic hashing).
• Reflection-based safe mass/length updates tolerant of missing or ambiguous residues.
• Comprehensive guards to prevent index drift after frameshifts / stops.
• SanitizeVariantData provides a pre-write data hygiene pipeline.
Performance Considerations
• Combination generation uses iterative index method (avoids recursion allocations).
• Early dismissal of invalid or depth-insufficient genotypes.
• OrderBy stages localized (only where deterministic ordering needed).
• HashCode struct used for consolidated hashing.
Suggested Upgrade Notes for Consumers

  1. Audit any creation of “reference-only” variants; adapt to new validation or skip creating them.
  2. Revisit equality-based dictionaries/sets: Description no longer differentiates variants.
  3. Ensure downstream tools expecting original modification leniency handle earlier exceptions.
  4. If using pre-existing variant combination logic, validate new genotype expansion doesn’t inflate search space unexpectedly; tune maxSequenceVariantsPerIsoform / maxSequenceVariantIsoforms.
  5. Review reporting paths that consumed previous variant name/accession formatting (they may now be longer with multiple descriptors).

@trishorts trishorts added the bug label Sep 17, 2025
@trishorts trishorts requested a review from Copilot September 18, 2025 15:48
@codecov
Copy link

codecov bot commented Sep 18, 2025

Codecov Report

❌ Patch coverage is 88.07286% with 203 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.89%. Comparing base (a682b7a) to head (3d73744).

Files with missing lines Patch % Lines
mzLib/Omics/BioPolymer/VariantApplication.cs 81.73% 49 Missing and 48 partials ⚠️
mzLib/Omics/BioPolymer/SequenceVariation.cs 91.96% 1 Missing and 35 partials ⚠️
...Databases/DecoyGeneration/DecoyProteinGenerator.cs 88.97% 14 Missing and 13 partials ⚠️
...roteolyticDigestion/PeptideWithSetModifications.cs 88.88% 5 Missing and 12 partials ⚠️
mzLib/Omics/BioPolymer/VariantCallFormat.cs 88.13% 3 Missing and 11 partials ⚠️
...micsDatabases/DecoyGeneration/RnaDecoyGenerator.cs 94.48% 6 Missing and 1 partial ⚠️
mzLib/Omics/BioPolymer/SnpEffAnnotation.cs 93.67% 0 Missing and 5 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #956      +/-   ##
==========================================
- Coverage   80.97%   80.89%   -0.09%     
==========================================
  Files         269      269              
  Lines       38744    39999    +1255     
  Branches     4228     4551     +323     
==========================================
+ Hits        31374    32357     +983     
- Misses       6639     6779     +140     
- Partials      731      863     +132     
Files with missing lines Coverage Δ
mzLib/Omics/Modifications/Modification.cs 97.47% <ø> (ø)
mzLib/Proteomics/Protein/Protein.cs 97.23% <100.00%> (-0.19%) ⬇️
mzLib/UsefulProteomicsDatabases/ProteinDbLoader.cs 95.29% <ø> (-0.73%) ⬇️
mzLib/UsefulProteomicsDatabases/ProteinDbWriter.cs 91.32% <ø> (-4.16%) ⬇️
mzLib/UsefulProteomicsDatabases/ProteinXmlEntry.cs 96.17% <ø> (-3.35%) ⬇️
...ProteomicsDatabases/Transcriptomics/RnaDbLoader.cs 78.70% <ø> (-14.99%) ⬇️
mzLib/Omics/BioPolymer/SnpEffAnnotation.cs 88.32% <93.67%> (+9.40%) ⬆️
...micsDatabases/DecoyGeneration/RnaDecoyGenerator.cs 94.73% <94.48%> (+11.01%) ⬆️
mzLib/Omics/BioPolymer/VariantCallFormat.cs 88.13% <88.13%> (ø)
...roteolyticDigestion/PeptideWithSetModifications.cs 92.66% <88.88%> (-3.27%) ⬇️
... and 3 more

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@trishorts trishorts requested review from Copilot, nbollis and zhuoxinshi and removed request for Copilot and nbollis September 18, 2025 16:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses two issues with variant application in protein XML parsing: 1) prevents crashes when overlapping sequence variants are applied together, and 2) filters out variants that target alternate protein isoforms rather than the base sequence.

  • Adds validation to prevent application of overlapping sequence variants that would cause position conflicts
  • Filters out sequence variants that reference alternate isoforms (e.g., "Q96J88-3") to avoid misapplication
  • Refactors the ParseAnnotatedMods method for improved readability

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
ProteinXmlEntry.cs Adds isoform filtering logic and refactors ParseAnnotatedMods method
VariantApplication.cs Adds overlap validation for variant combinations
TestProteomicsReadWrite.cs Adds tests for long substitution conflicts and alternate isoform handling
Test.csproj Includes new test XML files in build output
longSubstitution.xml Test case with overlapping variants
sequenceVariantOnAlternateIsoform.xml Test case with variants on alternate isoforms

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +458 to +460
appliesToThisSequence =
LocationSequenceId.Equals(acc, StringComparison.OrdinalIgnoreCase)
|| (!string.IsNullOrEmpty(acc) && LocationSequenceId.Equals($"{acc}-1", StringComparison.OrdinalIgnoreCase));
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The hardcoded '-1' suffix check assumes the base isoform is always numbered as '-1', but this may not be consistent across all protein databases. Consider using a more flexible approach or documenting this assumption.

Copilot uses AI. Check for mistakes.
public static List<TBioPolymerType> GetVariantBioPolymers<TBioPolymerType>(this TBioPolymerType protein, int maxAllowedVariantsForCombinatorics = 4, int minAlleleDepth = 1)
where TBioPolymerType : IHasSequenceVariants
{
if(maxAllowedVariantsForCombinatorics == 0)
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after 'if' keyword violates C# formatting conventions.

Suggested change
if(maxAllowedVariantsForCombinatorics == 0)
if (maxAllowedVariantsForCombinatorics == 0)

Copilot uses AI. Check for mistakes.
{
foreach (var combo in GetCombinations(variations, size))
{
if(!ValidCombination(combo.ToList()))
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after 'if' keyword violates C# formatting conventions.

Suggested change
if(!ValidCombination(combo.ToList()))
if (!ValidCombination(combo.ToList()))

Copilot uses AI. Check for mistakes.
{
foreach (var combo in GetCombinations(variations, size))
{
if(!ValidCombination(combo.ToList()))
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling ToList() on each combination creates unnecessary allocations. Consider changing ValidCombination to accept IEnumerable instead of List.

Suggested change
if(!ValidCombination(combo.ToList()))
if(!ValidCombination(combo))

Copilot uses AI. Check for mistakes.
@trishorts trishorts requested a review from Copilot September 18, 2025 23:17
@trishorts trishorts marked this pull request as ready for review September 18, 2025 23:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

string acc = Accession ?? "";
appliesToThisSequence =
LocationSequenceId.Equals(acc, StringComparison.OrdinalIgnoreCase)
|| (!string.IsNullOrEmpty(acc) && LocationSequenceId.Equals($"{acc}-1", StringComparison.OrdinalIgnoreCase));
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded '-1' suffix check assumes a specific isoform naming pattern. Consider using a more flexible approach that checks if LocationSequenceId starts with the accession followed by a dash, or make this configurable to handle different isoform naming conventions.

Suggested change
|| (!string.IsNullOrEmpty(acc) && LocationSequenceId.Equals($"{acc}-1", StringComparison.OrdinalIgnoreCase));
|| (!string.IsNullOrEmpty(acc) && LocationSequenceId.StartsWith(acc + "-", StringComparison.OrdinalIgnoreCase));

Copilot uses AI. Check for mistakes.
zhuoxinshi
zhuoxinshi previously approved these changes Sep 19, 2025
Copy link
Member

@nbollis nbollis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some issues left over from copilot:

  1. Rewrote entire sections of code that should not have been affected by changing how variants operate.
  2. Deleted a bunch of comments unnecessarily.
  3. Commented about what change in the code occurred due to your copilot prompt, but does not add any understanding to what is happening and why in the function.

Max variant isoforms being 1 to get the canonical out feels bad. The sequence we get out is not a variant isoform.

public int CodingDomainSequenceLengthIncludingStopCodon { get; }
public int OneBasedProteinPosition { get; }
public int ProteinLength { get; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all comments back in snpeff

/// Positions are validated in <see cref="AreValid"/> against the altered span (<see cref="VariantSequence"/>).
/// </summary>
public int OneBasedEndPosition { get; }
public Dictionary<int, List<Modification>> OneBasedModifications { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store the full mod dictionary here? This would (likely) more than double the memory footprint of the modifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short answer is yes.

OriginalSequence?.Length > 1 &&
VariantSequence?.Length == OriginalSequence.Length &&
OriginalSequence != VariantSequence &&
!IsPointSubstitution;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!IsPointSubstitution is a redundant check as the length check already does this.

/// <summary>
/// True if variant introduces a stop (* at end).
/// </summary>
public bool IsStopGain => VariantSequence?.EndsWith("*", StringComparison.Ordinal) == true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if bool == true?

/// Heuristic frameshift flag: length difference not equal & not simple stop gain only.
/// (Refine if you have explicit annotation elsewhere.)
/// </summary>
public bool IsLikelyFrameshift =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by frame shift? How can we evaluate a frame shift from only protein sequences?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the entirety of XML writing change here? I would assume only the variant relevant components needed to change?

@@ -30,7 +31,7 @@ public class ProteinXmlEntry
public string FeatureDescription { get; private set; }
public string SubFeatureType { get; private set; }
public string SubFeatureDescription { get; private set; }
public string OriginalValue { get; private set; } = ""; // if no content is found, assume it is empty, not null (e.g. <original>A</original><variation/> for a deletion event)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete these comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake. restored to former glory

@@ -172,19 +157,7 @@ private void ParseEntryAttributes(XmlReader xml)
DatabaseVersionEntryTag = xml.GetAttribute("version");
XmlnsEntryTag = xml.GetAttribute("xmlns");
}
/// <summary>
/// Parses some attributes of a &lt;sequence&gt; XML element and assigns their values to the corresponding properties of the ProteinXmlEntry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment was deleted again :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake. restored to former glory

/// <summary>
/// Computes the monoisotopic mass of the given sequence.
/// Returns 0 if the sequence is empty.
/// </summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake. restored to former glory

Sequence = ProteinDbLoader.SanitizeAminoAcidSequence(Sequence, 'X');

// NEW: prune any sequence variants whose coordinates exceed the now-known sequence length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover uninformative AI comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised comment

public string SimpleString()
{
return OriginalSequence + OneBasedBeginPosition.ToString() + VariantSequence;
if (OneBasedBeginPosition == OneBasedEndPosition || (OriginalSequence?.Length ?? 0) <= 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make this the ToString() method. Is there another ToString method that I missed?

segment.OneBasedEndPosition >= OneBasedBeginPosition && segment.OneBasedBeginPosition <= OneBasedEndPosition;

internal bool Intersects(TruncationProduct segment) =>
segment.OneBasedEndPosition >= OneBasedBeginPosition && segment.OneBasedBeginPosition <= OneBasedEndPosition;
Copy link
Contributor

@Alexander-Sol Alexander-Sol Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Intersects methods will only return true if the segment is a subset of this SequenceVariation, but not if they merely overlap, e.g.,
segment.OneBasedEndPosition >= OneBasedBeginPosition && segment.OneBasedEndPosition <= OneBasedEndPosition && segment.OneBasedBeginPosition OneBasedEndPosition

Is this the intended behavior?

public string[] Effects { get; } = Array.Empty<string>();
public string PutativeImpact { get; } = string.Empty;
public string GeneName { get; } = string.Empty;
public string GeneID { get; } = string.Empty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we initializing fields in this fashion? This isn't done anywhere else in the code base


public string[] Warnings { get; }
public string[] Warnings { get; } = Array.Empty<string>();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can wer remove lines 70-72, as they're unused?

if (int.TryParse(a[14], out y)) DistanceToFeature = y;
Warnings = a[15].Split('&');

string Get(int idx) => idx >= 0 && idx < a.Length ? a[idx] : string.Empty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool use of local functions

HGVSNotationDnaLevel = Get(9);
HGVSNotationProteinLevel = Get(10);

void ParseSlashField(string value, ref int first, ref int second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a private method?

ParseSlashField(Get(13), ref pos, ref len);
OneBasedProteinPosition = pos;
ProteinLength = len;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use this syntax? Why not just continually redefine/overwrite the post and len variables

}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

true, DecoyType.None, new List<Modification>(), false, new List<string>(),
out um,
maxSequenceVariantsPerIsoform: 4, minAlleleDepth: 1, totalConsensusPlusVariantIsoforms: 1);
Assert.That(new_proteins.First().OneBasedPossibleLocalizedModifications.First().Value.First().NeutralLosses.First().Value.Count == 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default behavior doesn't change, then should we be modifying method calls in test that aren't explicitly about variants?

}

//[Test]
//[Explicit("Long-running diagnostic; generates protein_variant_log.txt with per-protein variant expansion results.")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

Copy link
Contributor

@Alexander-Sol Alexander-Sol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes made are far reaching and extensive, which makes this PR difficult to review.

If possible, please reduce the size of this PR, eliminating unnecessary changes and all changes not directly related to sequence variants. If this PR can be split up into multiple, smaller PRs, that would be preferred.

using MzLibUtil;
using Omics.BioPolymer;
using Omics.Modifications;
using System.Reflection;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is concerning

{
var seq = created?.BaseSequence;
if (!string.IsNullOrEmpty(seq))
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this try statement, you're using reflection to access properties of your variant. This is concerning, as reflection should rarely be used in production code. It's slow and finicky.

Can you refactor this to avoid using reflection?

writer.WriteEndElement(); // feature or subfeature
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large chunks of this file were re-written. Are all rewrites necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants