-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Submitter: Robert Syme
Email: rob.syme@gmail.com
The v3.0 annotations (gff) contain 9835 annotations that have an extra codon included after the stop codon. For example, the protein translated from Lj0g3v0000709.1 is encoded in the gff file like so:
chr0 . gene 300849 302392 . . . ID=Lj0g3v0000709;Name=CUFF.74;sequencetype=Protein coding
chr0 . mRNA 300849 302392 . + . ID=Lj0g3v0000709.1;Parent=Lj0g3v0000709;Name=Lj0g3v0000709.1;sequencetype=Protein coding;annotation=hypothetical protein SPAPADRAFT_64676 [Spathaspora passalidarum NRRL Y-27907] gi|344305338|gb|EGW35570.1|
chr0 . exon 300849 301357 . + . ID=Lj0g3v0000709.1.exon.1;Parent=Lj0g3v0000709.1;sequencetype=Protein coding
chr0 . exon 302170 302392 . + . ID=Lj0g3v0000709.1.exon.2;Parent=Lj0g3v0000709.1;sequencetype=Protein coding
chr0 . CDS 300970 301357 . + 0 ID=Lj0g3v0000709.1.CDS.1;Parent=Lj0g3v0000709.1;sequencetype=Protein coding
chr0 . CDS 302170 302183 . + 2 ID=Lj0g3v0000709.1.CDS.2;Parent=Lj0g3v0000709.1;sequencetype=Protein coding
The CDS is separated across two exons (388 bp and 14 bp) for a total of 402 bp, or 134 aa. When translated, the 134 amino acids are:
>Lj0g3v0000709.1
MSQIFFLVAATTCHRSFSSSPPFLLISSHHHHNNQGANTTSPYIMFFFLLQSKTNHHCPFFSFSSLWPQKEQHPHAPHEP
PPSRFLLLHGWPNAPQTSVLLPHVAPLDSHDGHQSAPLRTTITFIFQFYGYE*R
Is the extra amino acid deliberatly included in the CDS feature? Should these 9835 proteins with a similar extra codon be included in comparative analysis?
Similarly, there seem to be a number of CDS feaures that contain premature stop codons. For example:
chr0 . gene 8669307 8670996 . . . ID=Lj0g3v0021349;Name=CUFF.1180;sequencetype=Protein coding
chr0 . mRNA 8669359 8670996 . + . ID=Lj0g3v0021349.2;Parent=Lj0g3v0021349;Name=Lj0g3v0021349.2;sequencetype=Protein coding;annotation=NoHit
chr0 . exon 8669359 8669413 . + . ID=Lj0g3v0021349.2.exon.1;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . exon 8669509 8669584 . + . ID=Lj0g3v0021349.2.exon.2;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . exon 8669905 8670079 . + . ID=Lj0g3v0021349.2.exon.3;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . exon 8670485 8670658 . + . ID=Lj0g3v0021349.2.exon.4;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . exon 8670754 8670856 . + . ID=Lj0g3v0021349.2.exon.5;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . exon 8670937 8670996 . + . ID=Lj0g3v0021349.2.exon.6;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . CDS 8669381 8669413 . + 0 ID=Lj0g3v0021349.2.CDS.1;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . CDS 8669509 8669584 . + 0 ID=Lj0g3v0021349.2.CDS.2;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0 . CDS 8669905 8669933 . + 2 ID=Lj0g3v0021349.2.CDS.3;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
Which translates to the protein:
>Lj0g3v0021349.2
MRMGVDNMSYEELLALGERIGHVNTGLSEDSLTKKQ*ADFIQGY*I
Are these genes to be translated with an alternative codon table?
Sorry to bother you, and I hope that I've not misunderstood the annotation gff.
- Rob Syme