Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
650f2fa
d
trishorts Sep 17, 2025
132b6ca
its broke
trishorts Sep 17, 2025
aed5b13
string unneccessary lines from xml
trishorts Sep 17, 2025
afb11e2
eliminates conflicting variant combinations and ignores variants on a…
trishorts Sep 18, 2025
d3f8ab2
eliminate unused code
trishorts Sep 18, 2025
87d1bd1
enable return of base proteins from xml with no applied sequence vari…
trishorts Sep 18, 2025
4720899
Merge branch 'master' into whatsWrongWithVariantApplicationAndXml
trishorts Sep 18, 2025
71543b0
nucleotide substitution modifications converted to squence variants d…
trishorts Sep 22, 2025
3f90baf
correctly set max variants per isoform and max isoform variants with …
trishorts Sep 22, 2025
312a385
still trouble correctly handling VCF sequence variants
trishorts Sep 22, 2025
197a4f8
add extensive comments to SequenceVariantDescription
trishorts Sep 24, 2025
241d2a6
comprehensive tests for VariantCallFormat formerly called SequenceVar…
trishorts Sep 25, 2025
b2fb89f
Merge branch 'whatsWrongWithVariantApplicationAndXml' of https://gith…
trishorts Sep 25, 2025
b09b482
update VariantCallFormat to improve handling of zygosity and added co…
trishorts Sep 25, 2025
6ca8a19
new code that splits sequence variants by genotypes and new code that…
trishorts Sep 26, 2025
7f94034
we now have one method to apply all variants
trishorts Sep 26, 2025
6da8776
fixed vcf constructor test
trishorts Sep 26, 2025
941da42
slow progress
trishorts Sep 26, 2025
84dc0a8
increment
trishorts Sep 26, 2025
f7e3c7e
IndelDecoyVariants passes
trishorts Sep 26, 2025
5c391e5
multiple alternate alleles passes
trishorts Sep 26, 2025
ffd6c5c
multiple alternate frameshifts passes
trishorts Sep 26, 2025
f83491b
stop gained passes
trishorts Sep 26, 2025
fab8922
stable
trishorts Sep 30, 2025
31d430b
fixed homozygous variants
trishorts Sep 30, 2025
dfbcb66
load seq var mods fixed
trishorts Sep 30, 2025
3d3f335
variant symbol weirdness fixed
trishorts Sep 30, 2025
d94c9d7
VariantXml fixed
trishorts Sep 30, 2025
3507d55
one more variant depth change
trishorts Sep 30, 2025
c193b50
test full protein read write fixed
trishorts Sep 30, 2025
2106390
RNA two truncations fixed
trishorts Sep 30, 2025
18ddc71
b
trishorts Sep 30, 2025
d41b4b3
applied variants fixed
trishorts Sep 30, 2025
5533480
indel decoy variants
trishorts Sep 30, 2025
8557e13
multiple alternate alleles
trishorts Sep 30, 2025
bc074dc
stop gained
trishorts Sep 30, 2025
96e5ca2
variantxml
trishorts Sep 30, 2025
40927bc
homozygous variants
trishorts Sep 30, 2025
b44837d
load seq var
trishorts Sep 30, 2025
ccce660
test identify and string
trishorts Sep 30, 2025
fd8c528
test hash
trishorts Sep 30, 2025
34572ad
compare protein properties
trishorts Sep 30, 2025
43d3b85
testing comlete xmls
trishorts Sep 30, 2025
c685909
progress
trishorts Sep 30, 2025
7dfa78c
no progress
trishorts Sep 30, 2025
6fea76c
before changes to proteindbwriter
trishorts Sep 30, 2025
b7c725f
okay don't sneeze
trishorts Sep 30, 2025
4a6a4b7
is this even real
trishorts Oct 1, 2025
0fa3981
better
trishorts Oct 1, 2025
d2e38eb
n
trishorts Oct 1, 2025
2902d61
f
trishorts Oct 1, 2025
8921cd0
dont sneeze bro
trishorts Oct 1, 2025
5e8a6e4
intersects sequence variations
trishorts Oct 1, 2025
f82ba62
test hash fixed
trishorts Oct 1, 2025
b1219b6
applied variants as i biopolymer
trishorts Oct 1, 2025
3243e32
indel decoy
trishorts Oct 1, 2025
41b439d
g
trishorts Oct 1, 2025
a8c8207
cool
trishorts Oct 1, 2025
cfb9dcc
green light baby
trishorts Oct 1, 2025
15fe961
Merge remote-tracking branch 'upstream/master' into whatsWrongWithVar…
trishorts Oct 1, 2025
a4ca06d
sentize variants test start
trishorts Oct 2, 2025
0e1431c
sanitize variant data test complete
trishorts Oct 2, 2025
3114199
combine descriptions unit test
trishorts Oct 2, 2025
7142dc0
test nucleotide substitution modification to sequence variant
trishorts Oct 2, 2025
7d60e22
variant application tests
trishorts Oct 2, 2025
3ece78d
nice
trishorts Oct 2, 2025
3a24d34
apply variants tests
trishorts Oct 2, 2025
171ac85
sanitize start
trishorts Oct 2, 2025
d623db1
k
trishorts Oct 2, 2025
ff07a18
good
trishorts Oct 2, 2025
7941caf
good
trishorts Oct 2, 2025
1ed8f2a
sanitize covered
trishorts Oct 2, 2025
0e0ebd6
mid
trishorts Oct 2, 2025
932a854
get biopolymers comprehensive tests
trishorts Oct 2, 2025
7a6a325
truncation product tests
trishorts Oct 2, 2025
b12abbf
test sequence variation indicies
trishorts Oct 2, 2025
ab986ac
test apply single variant
trishorts Oct 2, 2025
1b2bd69
split per genotype tests begin
trishorts Oct 2, 2025
33690ac
split per genotypes tests complete
trishorts Oct 2, 2025
e9a4482
eliminate unused code in SnpEffAnnotation
trishorts Oct 2, 2025
7f3c68c
put it back
trishorts Oct 2, 2025
fcdd4a2
protein db writer testss
trishorts Oct 3, 2025
4c6a4f4
RNA decoy tests
trishorts Oct 3, 2025
79929b5
copilot code review recommended changes
trishorts Oct 3, 2025
77a0cfe
will need to migrate metamorpheus calles to sequence variant description
trishorts Oct 3, 2025
56a1cee
fix copilot reported issues
trishorts Oct 6, 2025
4ae1030
dbloader tests
trishorts Oct 6, 2025
6702ad7
additional sequence variation tests
trishorts Oct 6, 2025
7842e09
peptide with set mods tests
trishorts Oct 7, 2025
48575d6
j
trishorts Oct 9, 2025
7739a00
fix unit tests that broke because of differences in the order that pr…
trishorts Oct 9, 2025
c1e8ee9
its cool
trishorts Oct 9, 2025
13b7ba9
updates to make rna db writer consistant w/ protein db writer
trishorts Oct 9, 2025
2f7a061
i dont believe that it actually works
trishorts Oct 9, 2025
2325fde
correctly write proteins with applied variants
trishorts Oct 9, 2025
68dabc2
collapse proteins with the same accession derived from application of…
trishorts Oct 9, 2025
814c193
test collapse proteins with applied variants having the same accessio…
trishorts Oct 9, 2025
d4a2ebb
change LoadProteinXML defaults so no variants get loaded by default a…
trishorts Oct 9, 2025
fc3b55a
fix more unit tests that fail due to changes in the default values
trishorts Oct 9, 2025
ec85f4c
repair more broken unit tests
trishorts Oct 9, 2025
33d10e6
no beta
trishorts Oct 9, 2025
9076e5d
plus four
trishorts Oct 9, 2025
db5b313
yay
trishorts Oct 9, 2025
bf9ae07
h
trishorts Oct 9, 2025
a1eb4a0
amended default to return canonical in proteindbreader
trishorts Oct 9, 2025
3261850
lots of updates for decoys of proteins with sequence variants
trishorts Oct 13, 2025
fc87bc9
delete unused code
trishorts Oct 13, 2025
75c0631
f
trishorts Oct 14, 2025
dd07eaf
g
trishorts Oct 14, 2025
b115053
stuff is working
trishorts Oct 14, 2025
b500d86
gs
trishorts Oct 15, 2025
af61107
remove local reference
trishorts Oct 15, 2025
895766f
extend variant application changes to rna and writing and reading of …
trishorts Oct 20, 2025
4f9d714
test rna with truncation
trishorts Oct 20, 2025
060a244
update xmlentry and dbloader to read decoys
trishorts Oct 20, 2025
abc110d
x
trishorts Oct 20, 2025
4c96cbc
intermediate save
trishorts Oct 20, 2025
f2a8c82
keep organism is merge proteins
trishorts Oct 20, 2025
4726017
manually merged nics decoy stuff
trishorts Oct 20, 2025
c2c3f05
merge upstream
trishorts Oct 20, 2025
966a65a
nics stuff is in
trishorts Oct 20, 2025
9c52cee
legacy load protein xml
trishorts Oct 20, 2025
2aa7a09
test for convert nucleotide substitution mod in loadproteinxml
trishorts Oct 21, 2025
2c8c0ba
test legacy loadproteinxml
trishorts Oct 21, 2025
136225c
covering some variant lines in peptide with set mods
trishorts Oct 21, 2025
b06ed98
update nuspec
trishorts Oct 21, 2025
1020526
organization
trishorts Oct 21, 2025
085116c
Merge remote-tracking branch 'upstream/master' into whatsWrongWithVar…
trishorts Oct 21, 2025
4319338
j
trishorts Oct 22, 2025
c4d52e8
tests
trishorts Oct 22, 2025
8292548
okay so peptide with set mods intersects was bugged
trishorts Oct 22, 2025
0618dce
ok cool
trishorts Oct 22, 2025
a2eb07c
unused code
trishorts Oct 22, 2025
663d07e
summary comments
trishorts Oct 22, 2025
01b9de2
g
trishorts Oct 22, 2025
6029fda
comments
trishorts Oct 22, 2025
d90c2de
Merge remote-tracking branch 'upstream/master' into whatsWrongWithVar…
trishorts Oct 22, 2025
ed48f63
return comments to snpeff
trishorts Oct 22, 2025
4dfbd11
rename maxIsoforms
trishorts Oct 22, 2025
3d73744
better not perfect
trishorts Oct 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions mzLib/Omics/BioPolymer/VariantApplication.cs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that every single function is entirely rewritten is very concerning. Was all of this necessary to create the desired changes?

Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ public static class VariantApplication
public static List<TBioPolymerType> GetVariantBioPolymers<TBioPolymerType>(this TBioPolymerType protein, int maxAllowedVariantsForCombinatorics = 4, int minAlleleDepth = 1)
where TBioPolymerType : IHasSequenceVariants
{
if(maxAllowedVariantsForCombinatorics == 0)
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after 'if' keyword violates C# formatting conventions.

Suggested change
if(maxAllowedVariantsForCombinatorics == 0)
if (maxAllowedVariantsForCombinatorics == 0)

Copilot uses AI. Check for mistakes.
{
// if no combinatorics allowed, just return the base protein
return new List<TBioPolymerType> { protein };
}
protein.ConsensusVariant.ConvertNucleotideSubstitutionModificationsToSequenceVariants();
protein.ConvertNucleotideSubstitutionModificationsToSequenceVariants();
if (protein.SequenceVariations.All(v => v.AreValid()) && protein.SequenceVariations.Any(v => v.Description == null || v.Description.Genotypes.Count == 0))
Expand Down Expand Up @@ -457,6 +462,8 @@ public static IEnumerable<TBioPolymerType> ApplyAllVariantCombinations<TBioPolym
{
foreach (var combo in GetCombinations(variations, size))
{
if(!ValidCombination(combo.ToList()))
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after 'if' keyword violates C# formatting conventions.

Suggested change
if(!ValidCombination(combo.ToList()))
if (!ValidCombination(combo.ToList()))

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling ToList() on each combination creates unnecessary allocations. Consider changing ValidCombination to accept IEnumerable instead of List.

Suggested change
if(!ValidCombination(combo.ToList()))
if(!ValidCombination(combo))

Copilot uses AI. Check for mistakes.
continue;
var result = baseBioPolymer;
foreach (var variant in combo)
{
Expand Down Expand Up @@ -503,6 +510,36 @@ private static IEnumerable<IList<SequenceVariation>> GetCombinations(List<Sequen
indices[i] = indices[i - 1] + 1;
}
}
public static bool ValidCombination(List<SequenceVariation> variations)
{
if (variations.Count <= 1)
return true;

// Validate inputs
for (int i = 0; i < variations.Count; i++)
{
var v = variations[i];
if (v == null || !v.AreValid())
return false;
}

// Sort by begin then end, then check only adjacent intervals
var ordered = variations
.OrderBy(v => v.OneBasedBeginPosition)
.ThenBy(v => v.OneBasedEndPosition)
.ToList();

var prev = ordered[0];
for (int i = 1; i < ordered.Count; i++)
{
var curr = ordered[i];
if (prev.Intersects(curr)) // inclusive overlap check
return false;

prev = curr;
}
return true;
}
public static void ConvertNucleotideSubstitutionModificationsToSequenceVariants<TBioPolymerType>(this TBioPolymerType protein)
where TBioPolymerType : IHasSequenceVariants
{
Expand Down
2 changes: 1 addition & 1 deletion mzLib/Omics/Modifications/Modification.cs
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ public Modification(string _originalId = null, string _accession = null, string
this.MonoisotopicMass = this.ChemicalFormula.MonoisotopicMass;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

public static string ModLocationOnPeptideOrProtein(string _locationRestriction)
{
switch (_locationRestriction)
Expand Down
44 changes: 44 additions & 0 deletions mzLib/Test/DatabaseTests/TestProteomicsReadWrite.cs
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,51 @@ public void ReadXmlNulls()
var ok = ProteinDbLoader.LoadProteinXML(Path.Combine(TestContext.CurrentContext.TestDirectory, "DatabaseTests", @"xml2.xml"), true, DecoyType.None,
null, false, null, out Dictionary<string, Modification> un);
}
[Test]
public void ReadSomeOldXmlWithLongSubstitutionThatHasAConflict()
{
//In this case, we have two different sequence variants. One is a long substitution, the other is a point mutation.
//If their positions didn't overlap, we should end up with four total protein sequences: the base protein, the protein with the long substitution,
//the protein with the point mutation, and the protein with both the long substitution and the point mutation.
//but, because the point mutation falls within the range of the long substitution, we should only end up with three total protein sequences:
string oldXmlPath = Path.Combine(TestContext.CurrentContext.TestDirectory, "DatabaseTests", @"longSubstitution.xml");
var psiModDeserialized = Loaders.LoadPsiMod(Path.Combine(TestContext.CurrentContext.TestDirectory, "PSI-MOD.obo2.xml"));
Dictionary<string, int> formalChargesDictionary = Loaders.GetFormalChargesDictionary(psiModDeserialized);
var uniprotPtms = Loaders.LoadUniprot(Path.Combine(TestContext.CurrentContext.TestDirectory, "ptmlist2.txt"), formalChargesDictionary).ToList();

List<Protein> ok = ProteinDbLoader.LoadProteinXML(oldXmlPath, true, DecoyType.None, uniprotPtms, false, null,
out Dictionary<string, Modification> un);
Assert.IsTrue(ok.Count == 3);
}
[Test]
public void SequenceVariantRefersToAlternateIsoform()
{
//In this case, we have a sequence variant that refers to an alternate isoform.
//We should still be able to load the protein, even if we don't have the alternate isoform sequence.
//for now we are ignoring the sequence variant if we don't have the alternate isoform sequence.
string oldXmlPath = Path.Combine(TestContext.CurrentContext.TestDirectory, "DatabaseTests", @"sequenceVariantOnAlternateIsoform.xml");
var psiModDeserialized = Loaders.LoadPsiMod(Path.Combine(TestContext.CurrentContext.TestDirectory, "PSI-MOD.obo2.xml"));
Dictionary<string, int> formalChargesDictionary = Loaders.GetFormalChargesDictionary(psiModDeserialized);
var uniprotPtms = Loaders.LoadUniprot(Path.Combine(TestContext.CurrentContext.TestDirectory, "ptmlist2.txt"), formalChargesDictionary).ToList();

List<Protein> ok = ProteinDbLoader.LoadProteinXML(oldXmlPath, true, DecoyType.None, uniprotPtms, false, null,
out Dictionary<string, Modification> un);
Assert.IsTrue(ok.Count == 1);
}
[Test]
public void ReadXmlSkipVariants()
{
//In this case, we have a couple different sequence variants. But, we don't want to apply any of them.
//instead, we just want the base protein sequence with mods.
string oldXmlPath = Path.Combine(TestContext.CurrentContext.TestDirectory, "DatabaseTests", @"longSubstitution.xml");
var psiModDeserialized = Loaders.LoadPsiMod(Path.Combine(TestContext.CurrentContext.TestDirectory, "PSI-MOD.obo2.xml"));
Dictionary<string, int> formalChargesDictionary = Loaders.GetFormalChargesDictionary(psiModDeserialized);
var uniprotPtms = Loaders.LoadUniprot(Path.Combine(TestContext.CurrentContext.TestDirectory, "ptmlist2.txt"), formalChargesDictionary).ToList();

List<Protein> ok = ProteinDbLoader.LoadProteinXML(oldXmlPath, true, DecoyType.None, uniprotPtms, false, null,
out Dictionary<string, Modification> un, maxHeterozygousVariants: 0);
Assert.IsTrue(ok.Count == 1);
}
[Test]
public void Test_readUniProtXML_writeProteinXml()
{
Expand Down
146 changes: 146 additions & 0 deletions mzLib/Test/DatabaseTests/longSubstitution.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://uniprot.org/uniprot http://www.uniprot.org/docs/uniprot.xsd">
<entry dataset="Swiss-Prot" created="2007-12-04" modified="2024-10-02" version="154" xmlns="http://uniprot.org/uniprot">
<accession>Q9H3J6</accession>
<accession>Q8WUC6</accession>
<name>MTRFR_HUMAN</name>
<protein>
<recommendedName>
<fullName evidence="17">Mitochondrial translation release factor in rescue</fullName>
</recommendedName>
</protein>
<gene>
<name evidence="18" type="primary">MTRFR</name>
<name type="ORF">C12orf65</name>
<name type="ORF">My030</name>
</gene>
<organism>
<name type="scientific">Homo sapiens</name>
<name type="common">Human</name>
<dbReference type="NCBI Taxonomy" id="9606"/>
<lineage>
<taxon>Eukaryota</taxon>
<taxon>Metazoa</taxon>
<taxon>Chordata</taxon>
<taxon>Craniata</taxon>
<taxon>Vertebrata</taxon>
<taxon>Euteleostomi</taxon>
<taxon>Mammalia</taxon>
<taxon>Eutheria</taxon>
<taxon>Euarchontoglires</taxon>
<taxon>Primates</taxon>
<taxon>Haplorrhini</taxon>
<taxon>Catarrhini</taxon>
<taxon>Hominidae</taxon>
<taxon>Homo</taxon>
</lineage>
</organism>
<proteinExistence type="evidence at protein level"/>
<keyword id="KW-0002">3D-structure</keyword>
<keyword id="KW-0025">Alternative splicing</keyword>
<keyword id="KW-0175">Coiled coil</keyword>
<keyword id="KW-0225">Disease variant</keyword>
<keyword id="KW-0890">Hereditary spastic paraplegia</keyword>
<keyword id="KW-0488">Methylation</keyword>
<keyword id="KW-0496">Mitochondrion</keyword>
<keyword id="KW-0523">Neurodegeneration</keyword>
<keyword id="KW-1274">Primary mitochondrial disease</keyword>
<keyword id="KW-0648">Protein biosynthesis</keyword>
<keyword id="KW-1267">Proteomics identification</keyword>
<keyword id="KW-1185">Reference proteome</keyword>
<keyword id="KW-0694">RNA-binding</keyword>
<keyword id="KW-0809">Transit peptide</keyword>
<feature type="transit peptide" description="Mitochondrion" evidence="2">
<location>
<begin position="1"/>
<end position="35"/>
</location>
</feature>
<feature type="chain" id="PRO_0000311835" description="Mitochondrial translation release factor in rescue">
<location>
<begin position="36"/>
<end position="166"/>
</location>
</feature>
<feature type="region of interest" description="GGQ domain" evidence="1">
<location>
<begin position="57"/>
<end position="121"/>
</location>
</feature>
<feature type="region of interest" description="Disordered" evidence="3">
<location>
<begin position="122"/>
<end position="148"/>
</location>
</feature>
<feature type="coiled-coil region" evidence="2">
<location>
<begin position="127"/>
<end position="160"/>
</location>
</feature>
<feature type="short sequence motif" description="GGQ" evidence="9">
<location>
<begin position="71"/>
<end position="73"/>
</location>
</feature>
<feature type="modified residue" description="N5-methylglutamine" evidence="15">
<location>
<position position="73"/>
</location>
</feature>
<feature type="splice variant" id="VSP_029602" description="In isoform 2." evidence="16">
<original>CHQTRSVDQNRKLARKILQEKVDVF</original>
<variation>VDHRRPLRGEAPPKGSTASRDFSQV</variation>
<location>
<begin position="95"/>
<end position="119"/>
</location>
</feature>
<feature type="splice variant" id="VSP_029603" description="In isoform 2." evidence="16">
<location>
<begin position="120"/>
<end position="166"/>
</location>
</feature>
<feature type="sequence variant" id="VAR_084490" description="In COXPD7; decreased cytochrome c oxidase activity in fibroblasts; severe assembly defects in mitochondrial complexes I, IV and V with a milder defect in the assembly of complex III; no effect on mitochondrial transcripts, rRNAs and tRNAs levels." evidence="5">
<original>VLKHIPSGIVVKCHQTRSVDQNRKLARKILQEKVDVFYNGENSPVHKEKREAAKKKQERKKRAKETLEKKKLLKELWESSKKVH</original>
<variation>G</variation>
<location>
<begin position="83"/>
<end position="166"/>
</location>
</feature>
<feature type="sequence variant" id="VAR_084491" description="In SPG55 and COXPD7; decreased activity of mitochondrial respiratory chain; no effect on mitochondrial morphology." evidence="9 10">
<location>
<begin position="116"/>
<end position="166"/>
</location>
</feature>
<feature type="sequence variant" id="VAR_084492" description="In SPG55." evidence="7">
<location>
<begin position="132"/>
<end position="166"/>
</location>
</feature>
<feature type="sequence variant" id="VAR_037325" description="In dbSNP:rs1045496.">
<original>A</original>
<variation>T</variation>
<location>
<position position="134"/>
</location>
</feature>
<feature type="sequence variant" id="VAR_084493" description="In SPG55." evidence="8">
<location>
<begin position="139"/>
<end position="166"/>
</location>
</feature>
<sequence length="166" mass="18828" checksum="CB9B74E0CC7E920C" modified="2001-03-01" version="1" precursor="true">MSTVGLFHFPTPLTRICPAPWGLRLWEKLTLLSPGIAVTPVQMAGKKDYPALLSLDENELEEQFVKGHGPGGQATNKTSNCVVLKHIPSGIVVKCHQTRSVDQNRKLARKILQEKVDVFYNGENSPVHKEKREAAKKKQERKKRAKETLEKKKLLKELWESSKKVH</sequence>
</entry>
<copyright>
Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms Distributed under the Creative Commons Attribution (CC BY 4.0) License
</copyright>
</uniprot>
130 changes: 130 additions & 0 deletions mzLib/Test/DatabaseTests/sequenceVariantOnAlternateIsoform.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
<?xml version="1.0" encoding="utf-8" standalone="no" ?>
<uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://uniprot.org/uniprot http://www.uniprot.org/docs/uniprot.xsd">
<entry dataset="Swiss-Prot" created="2008-01-15" modified="2025-06-18" version="148">
<accession>Q96J88</accession>
<accession>Q8IVC7</accession>
<accession>Q8NDQ7</accession>
<name>ESIP1_HUMAN</name>
<protein>
<recommendedName>
<fullName>Epithelial-stromal interaction protein 1</fullName>
</recommendedName>
</protein>
<gene>
<name type="primary">EPSTI1</name>
</gene>
<organism>
<name type="scientific">Homo sapiens</name>
<name type="common">Human</name>
<dbReference type="NCBI Taxonomy" id="9606"/>
<lineage>
</lineage>
</organism>
<proteinExistence type="evidence at protein level"/>
<keyword id="KW-0025">Alternative splicing</keyword>
<keyword id="KW-0175">Coiled coil</keyword>
<keyword id="KW-1267">Proteomics identification</keyword>
<keyword id="KW-1185">Reference proteome</keyword>
<feature type="chain" id="PRO_0000314034" description="Epithelial-stromal interaction protein 1">
<location>
<begin position="1"/>
<end position="318"/>
</location>
</feature>
<feature type="region of interest" description="Disordered" evidence="3">
<location>
<begin position="1"/>
<end position="60"/>
</location>
</feature>
<feature type="coiled-coil region" evidence="2">
<location>
<begin position="73"/>
<end position="188"/>
</location>
</feature>
<feature type="coiled-coil region" evidence="2">
<location>
<begin position="240"/>
<end position="280"/>
</location>
</feature>
<feature type="splice variant" id="VSP_030201" description="In isoform 2." evidence="6 7">
<location>
<begin position="219"/>
<end position="229"/>
</location>
</feature>
<feature type="splice variant" id="VSP_030202" description="In isoform 3." evidence="8">
<original>I</original>
<variation>
SLLVFSRHLRVYEKILTPIWPSSTDLEKPHEMLFLNVILFSLTVFTLISTAHTLDRAVRSDWLLLVLIYACLEELIPELIFNLYCQGNATLFF
</variation>
<location>
<position position="318"/>
</location>
</feature>
<feature type="sequence conflict" description="In Ref. 3; CAD38599." evidence="9" ref="3">
<location>
<position position="83"/>
</location>
</feature>
<feature type="sequence variant" id="VAR_082880" description="In dbSNP:rs11557739." evidence="9">
<original>P</original>
<variation>S</variation>
<location sequence="Q96J88-3">
<position position="338"/>
</location>
</feature>
<feature type="sequence variant" id="VAR_082881" description="In dbSNP:rs1044856." evidence="9">
<original>N</original>
<variation>K</variation>
<location sequence="Q96J88-3">
<position position="399"/>
</location>
</feature>
<evidence type="ECO:0000250" key="1">
<source>
<dbReference type="UniProtKB" id="Q8VDI1"/>
</source>
</evidence>
<evidence type="ECO:0000255" key="2"/>
<evidence type="ECO:0000256" key="3">
<source>
<dbReference type="SAM" id="MobiDB-lite"/>
</source>
</evidence>
<evidence type="ECO:0000269" key="4">
<source>
<dbReference type="PubMed" id="11991720"/>
</source>
</evidence>
<evidence type="ECO:0000269" key="5">
<source>
<dbReference type="PubMed" id="16769699"/>
</source>
</evidence>
<evidence type="ECO:0000303" key="6">
<source>
<dbReference type="PubMed" id="11991720"/>
</source>
</evidence>
<evidence type="ECO:0000303" key="7">
<source>
<dbReference type="PubMed" id="14702039"/>
</source>
</evidence>
<evidence type="ECO:0000303" key="8">
<source>
<dbReference type="PubMed" id="15489334"/>
</source>
</evidence>
<evidence type="ECO:0000305" key="9"/>
<sequence length="318" mass="36793" checksum="28C04AE12042A01A" modified="2008-01-15" version="2">
MNTRNRVVNSGLGASPASRPTRDPQDPSGRQGELSPVEDQREGLEAAPKGPSRESVVHAGQRRTSAYTLIAPNINRRNEIQRIAEQELANLEKWKEQNRAKPVHLVPRRLGGSQSETEVRQKQQLQLMQSKYKQKLKREESVRIKKEAEEAELQKMKAIQREKSNKLEEKKRLQENLRREAFREHQQYKTAEFLSKLNTESPDRSACQSAVCGPQSSTWKLPILPRDHSWARSWAYRDSLKAEENRKLQKMKDEQHQKSELLELKRQQQEQERAKIHQTEHRRVNNAFLDRLQGKSQPGGLEQSGGCWNMNSGNSWGI
</sequence>
</entry>
<copyright>
Copyrighted by the UniProt Consortium, see https://www.uniprot.org/terms Distributed under the Creative Commons Attribution (CC BY 4.0) License
</copyright>
</uniprot>
Loading
Loading