-
Notifications
You must be signed in to change notification settings - Fork 38
Ptm stoichiometry #955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Ptm stoichiometry #955
Conversation
Codecov Report❌ Patch coverage is Please upload reports for the commit 302edd7 to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## master #955 +/- ##
==========================================
+ Coverage 80.97% 80.99% +0.01%
==========================================
Files 269 274 +5
Lines 38744 38969 +225
Branches 4228 4267 +39
==========================================
+ Hits 31374 31561 +187
- Misses 6639 6669 +30
- Partials 731 739 +8
🚀 New features to boost your workflow:
|
589b275 to
8a895de
Compare
|
Hi there! @trishorts asked me to take a look at this one. It seems like a great pull request for getting modification ratios from results. On stoichiometry, it's worth considering Jesper Olsen's work on stoichiometry measurements back in 2010 and what's been going on since then. Getting a true stoichiometry measure has required some tricks in the past to get the fully unoccupied quantity. I think the ratios might be useful, but I've struggled in the past when I've calculated such ratios to claim confidently that they're stoichiometries. You could consider using that original data to compare how ratios stack up to the measured stoichiometries if you're writing this up in a paper. https://www.science.org/doi/abs/10.1126/scisignal.2000475 I haven't looked at this logic in a long time, and it looks like it's been changed a bit, but do you think there is work to be done in |
|
@acesnik thank you for your thoughts and the paper you shared! I had not come across it and still need to get through it more closely, but it seems like a great reference for a paper. You are absolutely right that this calculation just outputs the residue-specific intensity ratios, and that is not the full picture of a site-occupancy calculation. I still need to compare the values obtained from this analysis to the residue-specific PSM count ratios we currently output in MetaMorpheus, but the goal is to output something quantitatively closer to a mod's stoichiometry. I'm thinking that the correction/scaling of intensities to account for peptide ionization efficiency would be a follow-up enhancement to this code (or maybe on the MetaMorpheus end?). There is still a little more exploring I'd like to do before deciding on which approaches we'd like to support on mzLib or MetaMorpheus for stoichiometry calculation, but at least whoever would like to implement their own normalization/scaling strategies can do so from the output being facilitated here in the meantime. As for sequence variations, variants in MetaMorpheus will get their own accessions, be treated as different proteins, and be placed accordingly into protein groups. Once the protein groups and variant sequences are extracted, this code will treat it as any other protein group. As for using this code directly from mzlib, sequence variants would need to be passed in as different proteins. In either case, the code here (specifically, the Please let me know if you have further thoughts/concerns I can address! |
692b5ed to
a874306
Compare
…culation in mzlibutils were copied from the previous branch onto this one. Need to add/remake the tests next.
…s. Need tests for the protein groups and the occupancy set up (currently called CalculateOccupancies).
…ts population) from SetUpQuantificationObjects method for now.
a874306 to
6389de1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of review comments. Two things that jump out:
- Lack of summary comments. There are a lot of dictionaries in use, and I'm not always sure what the keys are just from looking at them. It would be nice to have summary comments that explain the dictionaries
- IdWithMotif doesn't include the modification type (the part before the colon).
| { | ||
| public Dictionary<string, QuantifiedProteinGroup> ProteinGroups { get; private set; } | ||
|
|
||
| //public Dictionary<string, (QuantifiedPeptide QuantifiedPeptide, string ProteinGroups)> Peptides { get; private set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out property
| /// all of the amino acids in that peptide.</returns> | ||
| /// | ||
| public void SetUpQuantificationObjectsFromFullSequences(List<(string fullSeq, List<string> proteinGroups, double intensity)> peptides, Dictionary<string, string> proteinSequences=null) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing in a list of tuples, have you considered making a lightweight class to hold that information? Like a record class that stores the sequence, protein groups, and intensity. Also, it's not clear what the proteinGroups are and what information they contain. In IQuantifiableRecord, a tuple stores accessions, gene names, and organisms for the different protein groups. Just using Accessions would probably work as well, but I would like guidance on what the proteinGroups actually are
| { | ||
| ProteinGroups[pg] = new QuantifiedProteinGroup(pg); | ||
| } | ||
| var proteinGroup = ProteinGroups[pg]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are actually multiple protein groups associated with the peptide, should we store the combined protein group in the dictionary? What would happen if we split first, as in line 40, then added them to the dictionary?
| public string BaseSequence { get; set; } | ||
| public QuantifiedProtein ParentProtein { get; set; } | ||
| public int OneBasedStartIndexInProtein { get; set; } | ||
| public Dictionary<int, Dictionary<string, QuantifiedModification>> ModifiedAminoAcidPositions { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What string serves as the key in the <string, QuantMod> dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the string just stores position, could the ModifiedAmminoAcidPositions just be an <int, List> Dictionary? The position string seems redundant
| peptide.OneBasedStartIndexInProtein = Sequence.IndexOf(peptide.BaseSequence) + 1; | ||
| } | ||
| // if peptide has no modifications, add to all its positions | ||
| if (!peptide.ModifiedAminoAcidPositions.IsNotNullOrEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsNullOrEmpty() would be slightly cleaner
| public string Name { get; set; } | ||
| public Dictionary<string, QuantifiedProtein> Proteins { get; set; } | ||
|
|
||
| public QuantifiedProteinGroup(string name, Dictionary<string, QuantifiedProtein> proteins = null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the string key here?
| [Test] | ||
| public void TestQuantifiedModification() | ||
| { | ||
| var quantmod = new QuantifiedModification(idWithMotif: "TestMod: ModX on AAY", positionInPeptide: 1, positionInProtein: 2, intensity: 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IdWithMotif just refers to the part after the colon. The full mod string is {Modification Type}: {Id with motif}
…tide input for setting up the protein groups and the quantifications.
New PR with changes from PR #797 and more.
Description:
PTM stoichiometry classes created for each structural level (protein group -> protein -> (base) peptide - modification). Probably the most important class to understand is the QuantifiedPeptide class for which an object represents a collection of post-translationally modified variants sharing the same base sequence. This is in contrast with other Peptide classes that represent a peptide with a given full sequence. The
QuantifiedProteinclass, which stores theQuantifiedPeptideobjects, handles peptide-to-protein indexing and obtaining of the modification stoichiometry for the protein. TheQuantifiedModificationclass is primarily a data class, and theQuantifiedProteinGroupclass is mostly a container for the differentQuantifiedProteinmembers of the protein group. Lastly, thePositionFrequencyAnalysisclass has a method to take a list of (full sequence, protein group list, intensity) tuples and create a collection ofQuantifiedProteinGroupobjects.