-
Notifications
You must be signed in to change notification settings - Fork 38
PTM Stoichiometry #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PTM Stoichiometry #797
Conversation
|
|
||
| // get the localized modifications from the peptide full sequence and add any amino acid/modification combination not | ||
| // seen yet to the occupancy dictionary | ||
| foreach (KeyValuePair<int, List<string>> aaWithModList in peptideMods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In situations like this, you can use "var aaWithModList" instead of specifying the actual class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think readers/Quant... is the best place for it. That way it can be used to find occupancy of the results from another software should that be desired.
In order to optimize your inputs and outputs of the function, you should break your test method into two. One test method with reads in all the data you need. Another method (not a test method) that gets called to calculate the occupancy. This will help you to better understand what is needed for the method, and for use to help make recommendations
|
Requesting a second round of reviews! The second to last commit contains a little more in detail most changes. Currently pending work is to create a small enough subset of the raw data to create a test similar to the I'd be happy to hear about 1) code optimization, 2) currently written tests, and 3) clarifications on code commenting. In a conversation, Nic suggested using objects for my main ptm calculation code rather than the 5-level deep dictionary, thoughts on that would be useful as well. Ofc, anything else is useful. TIA! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #797 +/- ##
==========================================
+ Coverage 77.78% 77.83% +0.04%
==========================================
Files 229 230 +1
Lines 34159 34351 +192
Branches 3539 3570 +31
==========================================
+ Hits 26570 26736 +166
- Misses 6985 7009 +24
- Partials 604 606 +2
🚀 New features to boost your workflow:
|
mzLib/MzLibUtil/ClassExtensions.cs
Outdated
| { | ||
| // use a regex to get all modifications | ||
| string pattern = @"\[(.+?)\]"; | ||
| Regex regex = new(pattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to make sure that this method never thinks that
[hydroxylation]EPT[phospho] is accidentaly identified as a mod for P[hydroxylation]EPT[phospho]IDE
I'm not sure that ]EPT[ won't be ignored by your regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After finding an opening bracket, regex will always find the next closing bracket, except (updated now) in the case where the closing bracket belongs to an ion charge state.
| namespace MzLibUtil | ||
| { | ||
| // Should this have all of the parent data (i.e. protein group, protein, peptide, peptide position)? Unnecessary for now, but probably useful later. | ||
| public class UtilModification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UtilModification => LocalizedModificationFromTsv
modName => IdWithMotif
position =>PeptidePositionZeroIsNterminus
| { | ||
| public string FullSequence { get; set; } | ||
| public string BaseSequence { get; set; } | ||
| public UtilProtein ParentProtein { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be ProteinGroup?
| } | ||
| } | ||
|
|
||
| public class UtilProtein |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flashlfq proteingroup
|
Some pending changes: Side Note: |
542f959 to
0184816
Compare
…. Changed the BioPolymerWithSetModsExtensions class to write full sequences separating the C-terminus with a dash. Updated some of the tests that failed because of the new notation of C-terminus mods. Some tests are still failing, and will be updated once happy with this general change.
…t handle ambiguity(or multiple mods at the same position). Modified the corresponding tests or commented them out in case we want to revert.
…ve amino acid positions depending on the length for the modification string and its index. Current approach fixes that.
…sitionFrequencyAnalysis UtilProtein class (now updates peptide mod positions to protein positions) and PFA argument (list of named tuple for clarity)
…ing to master and matching content
…d of FlashLFQ to output occupancy. Updated UtilClasses for correct UtilProtein.ModifiedAminoAcidPositionsInProtein positions.
…g work but untested. WIP.
…urrently does not addequately identify N-Terminus Mods. Make sure UtilProtein.SetProteinModsFromPeptides correctly adds terminal protein mods. Saving but needs more rigorous testing once ParseModifications updated (in separate PR) to correctly parse N-Terminus mods. WIP
…s sequences, since it covers most but not many interesting cases. Best to remove it to maintain code coverage. I will add some notes on the issue on the PR for future reference.
… not finished and there are errors.
…tion of occupancy implementation and fixed quantifiedprotein and quantifiedpeptide classes. Might need to update the inputs of occupancy calculation to handle position in proteins.
…ccupancies. More refactoring done. ProteinGroup occupancy seems to be working. Still polishing peptide occupancy. Metamorpheus branch updated to work with PG occupancy refactoring. Will add peptide occupancy output to metamorpheus branch.
…g peptides from different experiments.
…corporating updated. N-terminus mod writing.
|
Stale and updated in new branch with updated and clean commit history. Pull request addressing these changes (and more) is now #955. |

Creating a mzLib method to calculate the stoichiometry (or site-occupancy) of PTMs using the intensity of each quantified peak. The current inputs are the protein database(s) file(.xml) paths and the AllQuantifiedPeaks.tsv file path. The output,
occupancyDict, is currently a dictionary of nested dictionaries with the following structure:where
PROTEINXis the protein accession,MAAXis the modified amino acid at protein position X, andMODNAME1is the full label of the modification. For eachMAAX, there is a"Total"key (instead of a modification name) that holds the total intensity of that amino acid measured in the quantified peaks file, including modified and unmodified peptides with that specific residue.The general approach is to first get all of the modification intensities and record those in
occupancyDictwhile storing inproteinSeqRangesSeena dictionary with protein accession keys and values stored as a list of(STARTINDEX, ENDINDEX, INTENSITY)tuples. This helps keep track of the index ranges seen for each protein. Once we have parsed all of the mods, for every amino acid falling into any of those ranges, we increase its"Total"intensity by that amount.From our discussion, I've added below some of the items I'd like to get some opinions about. Imade them a task list primarily for me to keep track of what I've figured out.
FlashLFQResultsandReaders/QuantificationResults.Thanks in advance!