Skip to content

Lotus v3.0 annotations with extra codons #51

@terrymun

Description

@terrymun

Submitter: Robert Syme
Email: rob.syme@gmail.com
The v3.0 annotations (gff) contain 9835 annotations that have an extra codon included after the stop codon. For example, the protein translated from Lj0g3v0000709.1 is encoded in the gff file like so:

chr0    .       gene    300849  302392  .       .       .       ID=Lj0g3v0000709;Name=CUFF.74;sequencetype=Protein coding
chr0    .       mRNA    300849  302392  .       +       .       ID=Lj0g3v0000709.1;Parent=Lj0g3v0000709;Name=Lj0g3v0000709.1;sequencetype=Protein coding;annotation=hypothetical protein SPAPADRAFT_64676 [Spathaspora passalidarum NRRL Y-27907] gi|344305338|gb|EGW35570.1|
chr0    .       exon    300849  301357  .       +       .       ID=Lj0g3v0000709.1.exon.1;Parent=Lj0g3v0000709.1;sequencetype=Protein coding
chr0    .       exon    302170  302392  .       +       .       ID=Lj0g3v0000709.1.exon.2;Parent=Lj0g3v0000709.1;sequencetype=Protein coding
chr0    .       CDS     300970  301357  .       +       0       ID=Lj0g3v0000709.1.CDS.1;Parent=Lj0g3v0000709.1;sequencetype=Protein coding
chr0    .       CDS     302170  302183  .       +       2       ID=Lj0g3v0000709.1.CDS.2;Parent=Lj0g3v0000709.1;sequencetype=Protein coding

The CDS is separated across two exons (388 bp and 14 bp) for a total of 402 bp, or 134 aa. When translated, the 134 amino acids are:

>Lj0g3v0000709.1
MSQIFFLVAATTCHRSFSSSPPFLLISSHHHHNNQGANTTSPYIMFFFLLQSKTNHHCPFFSFSSLWPQKEQHPHAPHEP
PPSRFLLLHGWPNAPQTSVLLPHVAPLDSHDGHQSAPLRTTITFIFQFYGYE*R

Is the extra amino acid deliberatly included in the CDS feature? Should these 9835 proteins with a similar extra codon be included in comparative analysis?

Similarly, there seem to be a number of CDS feaures that contain premature stop codons. For example:

chr0    .       gene    8669307 8670996 .       .       .       ID=Lj0g3v0021349;Name=CUFF.1180;sequencetype=Protein coding
chr0    .       mRNA    8669359 8670996 .       +       .       ID=Lj0g3v0021349.2;Parent=Lj0g3v0021349;Name=Lj0g3v0021349.2;sequencetype=Protein coding;annotation=NoHit
chr0    .       exon    8669359 8669413 .       +       .       ID=Lj0g3v0021349.2.exon.1;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       exon    8669509 8669584 .       +       .       ID=Lj0g3v0021349.2.exon.2;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       exon    8669905 8670079 .       +       .       ID=Lj0g3v0021349.2.exon.3;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       exon    8670485 8670658 .       +       .       ID=Lj0g3v0021349.2.exon.4;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       exon    8670754 8670856 .       +       .       ID=Lj0g3v0021349.2.exon.5;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       exon    8670937 8670996 .       +       .       ID=Lj0g3v0021349.2.exon.6;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       CDS     8669381 8669413 .       +       0       ID=Lj0g3v0021349.2.CDS.1;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       CDS     8669509 8669584 .       +       0       ID=Lj0g3v0021349.2.CDS.2;Parent=Lj0g3v0021349.2;sequencetype=Protein coding
chr0    .       CDS     8669905 8669933 .       +       2       ID=Lj0g3v0021349.2.CDS.3;Parent=Lj0g3v0021349.2;sequencetype=Protein coding

Which translates to the protein:

>Lj0g3v0021349.2
MRMGVDNMSYEELLALGERIGHVNTGLSEDSLTKKQ*ADFIQGY*I

Are these genes to be translated with an alternative codon table?

Sorry to bother you, and I hope that I've not misunderstood the annotation gff.

  • Rob Syme

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions