Ptm stoichiometry #955

pcruzparri · 2025-09-16T20:33:36Z

New PR with changes from PR #797 and more.

Description:
PTM stoichiometry classes created for each structural level (protein group -> protein -> (base) peptide - modification). Probably the most important class to understand is the QuantifiedPeptide class for which an object represents a collection of post-translationally modified variants sharing the same base sequence. This is in contrast with other Peptide classes that represent a peptide with a given full sequence. The QuantifiedProtein class, which stores the QuantifiedPeptide objects, handles peptide-to-protein indexing and obtaining of the modification stoichiometry for the protein. The QuantifiedModification class is primarily a data class, and the QuantifiedProteinGroup class is mostly a container for the different QuantifiedProtein members of the protein group. Lastly, the PositionFrequencyAnalysis class has a method to take a list of (full sequence, protein group list, intensity) tuples and create a collection of QuantifiedProteinGroup objects.

codecov · 2025-09-16T20:43:19Z

Codecov Report

❌ Patch coverage is 82.22222% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.99%. Comparing base (a682b7a) to head (6389de1).
⚠️ Report is 2 commits behind head on master.

⚠️ Current head 6389de1 differs from pull request most recent head 302edd7

Please upload reports for the commit 302edd7 to get more accurate results.

Files with missing lines	Patch %	Lines
...til/PositionFrequencyAnalysis/QuantifiedProtein.cs	71.26%	19 Missing and 6 partials ⚠️
...til/PositionFrequencyAnalysis/QuantifiedPeptide.cs	81.42%	12 Missing and 1 partial ⚠️
...tionFrequencyAnalysis/PositionFrequencyAnalysis.cs	97.14%	0 Missing and 1 partial ⚠️
...ositionFrequencyAnalysis/QuantifiedProteinGroup.cs	93.33%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #955      +/-   ##
==========================================
+ Coverage   80.97%   80.99%   +0.01%     
==========================================
  Files         269      274       +5     
  Lines       38744    38969     +225     
  Branches     4228     4267      +39     
==========================================
+ Hits        31374    31561     +187     
- Misses       6639     6669      +30     
- Partials      731      739       +8

Files with missing lines	Coverage Δ
mzLib/MzLibUtil/ClassExtensions.cs	`100.00% <100.00%> (+1.68%)`	⬆️
...ositionFrequencyAnalysis/QuantifiedModification.cs	`100.00% <100.00%> (ø)`
...tionFrequencyAnalysis/PositionFrequencyAnalysis.cs	`97.14% <97.14%> (ø)`
...ositionFrequencyAnalysis/QuantifiedProteinGroup.cs	`93.33% <93.33%> (ø)`
...til/PositionFrequencyAnalysis/QuantifiedPeptide.cs	`81.42% <81.42%> (ø)`
...til/PositionFrequencyAnalysis/QuantifiedProtein.cs	`71.26% <71.26%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

acesnik · 2025-10-06T15:36:32Z

Hi there! @trishorts asked me to take a look at this one. It seems like a great pull request for getting modification ratios from results.

On stoichiometry, it's worth considering Jesper Olsen's work on stoichiometry measurements back in 2010 and what's been going on since then. Getting a true stoichiometry measure has required some tricks in the past to get the fully unoccupied quantity. I think the ratios might be useful, but I've struggled in the past when I've calculated such ratios to claim confidently that they're stoichiometries. You could consider using that original data to compare how ratios stack up to the measured stoichiometries if you're writing this up in a paper. https://www.science.org/doi/abs/10.1126/scisignal.2000475

I haven't looked at this logic in a long time, and it looks like it's been changed a bit, but do you think there is work to be done in mzLib/Omics/BioPolymer/VariantApplication.cs, or do you think there's logic to add to these new methods regarding sequence variations? Proteins with amino acid variations may have modifications listed within those data structures, https://github.com/acesnik/mzLib/blob/master/mzLib/Omics/BioPolymer/SequenceVariation.cs#L24. Should those be quantified and tested for these ratio calculations in the situations when amino acid variations are added?

pcruzparri · 2025-10-06T19:30:32Z

@acesnik thank you for your thoughts and the paper you shared! I had not come across it and still need to get through it more closely, but it seems like a great reference for a paper.

You are absolutely right that this calculation just outputs the residue-specific intensity ratios, and that is not the full picture of a site-occupancy calculation. I still need to compare the values obtained from this analysis to the residue-specific PSM count ratios we currently output in MetaMorpheus, but the goal is to output something quantitatively closer to a mod's stoichiometry. I'm thinking that the correction/scaling of intensities to account for peptide ionization efficiency would be a follow-up enhancement to this code (or maybe on the MetaMorpheus end?). There is still a little more exploring I'd like to do before deciding on which approaches we'd like to support on mzLib or MetaMorpheus for stoichiometry calculation, but at least whoever would like to implement their own normalization/scaling strategies can do so from the output being facilitated here in the meantime.

As for sequence variations, variants in MetaMorpheus will get their own accessions, be treated as different proteins, and be placed accordingly into protein groups. Once the protein groups and variant sequences are extracted, this code will treat it as any other protein group. As for using this code directly from mzlib, sequence variants would need to be passed in as different proteins. In either case, the code here (specifically, the SetUpQuanitficationFromFullSequences() method) is written to only take in a full peptide sequence, the protein group names the full sequence belong to, and the peptide's intensity and then create new quantification objects. All mod information is parsed from the full sequence string and localized using the full sequence and the provided protein sequences (which would require passing each variant protein sequence separately).

Please let me know if you have further thoughts/concerns I can address!

…culation in mzlibutils were copied from the previous branch onto this one. Need to add/remake the tests next.

…s. Need tests for the protein groups and the occupancy set up (currently called CalculateOccupancies).

…ough

…ts population) from SetUpQuantificationObjects method for now.

Alexander-Sol

Left a couple of review comments. Two things that jump out:

Lack of summary comments. There are a lot of dictionaries in use, and I'm not always sure what the keys are just from looking at them. It would be nice to have summary comments that explain the dictionaries
IdWithMotif doesn't include the modification type (the part before the colon).

Alexander-Sol · 2025-10-22T18:06:44Z

mzLib/MzLibUtil/PositionFrequencyAnalysis/PositionFrequencyAnalysis.cs

+    {
+        public Dictionary<string, QuantifiedProteinGroup> ProteinGroups { get; private set; }
+
+        //public Dictionary<string, (QuantifiedPeptide QuantifiedPeptide, string ProteinGroups)> Peptides { get; private set; }


Remove commented out property

Alexander-Sol · 2025-10-24T22:12:21Z

mzLib/MzLibUtil/PositionFrequencyAnalysis/PositionFrequencyAnalysis.cs

+        /// all of the amino acids in that peptide.</returns>
+        ///
+        public void SetUpQuantificationObjectsFromFullSequences(List<(string fullSeq, List<string> proteinGroups, double intensity)> peptides, Dictionary<string, string> proteinSequences=null)
+        {


Instead of passing in a list of tuples, have you considered making a lightweight class to hold that information? Like a record class that stores the sequence, protein groups, and intensity. Also, it's not clear what the proteinGroups are and what information they contain. In IQuantifiableRecord, a tuple stores accessions, gene names, and organisms for the different protein groups. Just using Accessions would probably work as well, but I would like guidance on what the proteinGroups actually are

Alexander-Sol · 2025-10-24T22:13:50Z

mzLib/MzLibUtil/PositionFrequencyAnalysis/PositionFrequencyAnalysis.cs

+                    {
+                        ProteinGroups[pg] = new QuantifiedProteinGroup(pg);
+                    }
+                    var proteinGroup = ProteinGroups[pg];


If there are actually multiple protein groups associated with the peptide, should we store the combined protein group in the dictionary? What would happen if we split first, as in line 40, then added them to the dictionary?

Alexander-Sol · 2025-10-24T22:16:38Z

mzLib/MzLibUtil/PositionFrequencyAnalysis/QuantifiedPeptide.cs

+        public string BaseSequence { get; set; }
+        public QuantifiedProtein ParentProtein { get; set; }
+        public int OneBasedStartIndexInProtein { get; set; }
+        public Dictionary<int, Dictionary<string, QuantifiedModification>> ModifiedAminoAcidPositions { get; set; }


What string serves as the key in the <string, QuantMod> dictionary?

If the string just stores position, could the ModifiedAmminoAcidPositions just be an <int, List> Dictionary? The position string seems redundant

Alexander-Sol · 2025-10-24T22:20:34Z

mzLib/MzLibUtil/PositionFrequencyAnalysis/QuantifiedProtein.cs

+                    peptide.OneBasedStartIndexInProtein = Sequence.IndexOf(peptide.BaseSequence) + 1;
+                }
+                // if peptide has no modifications, add to all its positions
+                if (!peptide.ModifiedAminoAcidPositions.IsNotNullOrEmpty())


IsNullOrEmpty() would be slightly cleaner

Alexander-Sol · 2025-10-24T22:24:07Z

mzLib/MzLibUtil/PositionFrequencyAnalysis/QuantifiedProteinGroup.cs

+        public string Name { get; set; }
+        public Dictionary<string, QuantifiedProtein> Proteins { get; set; }
+
+        public QuantifiedProteinGroup(string name, Dictionary<string, QuantifiedProtein> proteins = null)


What is the string key here?

Alexander-Sol · 2025-10-24T22:26:45Z

mzLib/Test/TestMzLibUtil.cs

+        [Test]
+        public void TestQuantifiedModification()
+        {
+            var quantmod = new QuantifiedModification(idWithMotif: "TestMod: ModX on AAY", positionInPeptide: 1, positionInProtein: 2, intensity: 10);


IdWithMotif just refers to the part after the colon. The full mod string is {Modification Type}: {Id with motif}

…tide input for setting up the protein groups and the quantifications.

pcruzparri added WIP New Feature labels Sep 16, 2025

pcruzparri mentioned this pull request Sep 16, 2025

PTM Stoichiometry #797

Closed

3 tasks

pcruzparri force-pushed the PTMStoichiometry branch from 589b275 to 8a895de Compare September 30, 2025 18:57

pcruzparri marked this pull request as ready for review October 2, 2025 18:53

pcruzparri added the ready for review label Oct 2, 2025

acesnik self-requested a review October 4, 2025 00:34

pcruzparri force-pushed the PTMStoichiometry branch from 692b5ed to a874306 Compare October 20, 2025 16:28

pcruzparri added 6 commits October 23, 2025 13:27

New clean repo with ptm_stoch contents. The methods for occupancy cal…

3b1ddb8

…culation in mzlibutils were copied from the previous branch onto this one. Need to add/remake the tests next.

Added TestMzLibUtils tests for quantified mods, peptides, and protein…

2949457

…s. Need tests for the protein groups and the occupancy set up (currently called CalculateOccupancies).

Added PG and Quant object setup tests. Need to finish these tests, th…

25bf8da

…ough

Finshed TestSetUpQuantificationObjects. Removed Peptides field (and i…

31c40cd

…ts population) from SetUpQuantificationObjects method for now.

Refactored quantification util classes

1cbfbaf

improving quantprot exception throw.

6389de1

pcruzparri force-pushed the PTMStoichiometry branch from a874306 to 6389de1 Compare October 23, 2025 18:29

Alexander-Sol requested changes Oct 24, 2025

View reviewed changes

Extended commenting. Added a peptide record class that stores the pep…

302edd7

…tide input for setting up the protein groups and the quantifications.

Ptm stoichiometry #955

Are you sure you want to change the base?

Ptm stoichiometry #955

Uh oh!

Conversation

pcruzparri commented Sep 16, 2025

Uh oh!

codecov bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

acesnik commented Oct 6, 2025

Uh oh!

pcruzparri commented Oct 6, 2025

Uh oh!

Alexander-Sol left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 16, 2025 •

edited

Loading