Skip to content

Getting all synonyms, common names and Genbank common name along with the Scientific name of taxons #382

@bilalix

Description

@bilalix

I want to generate a JSON file containing the taxonID, the corresponding scientific name and synonyms (for the sake of simplicity the term synonyms is equivalent to synonyms, common names and Genbank common name), it will have the following structure:

{
   taxonid1 : ["sciName, syn1, syn2.."],
   taxonid2 : ["sciName, syn1, syn2.."],
   ......
 }

And I'm willing to do that for a group of descendants (e.g Viridiplantae) based on their taxonIDs.

To get the desired results, I first used ncbi.get_descendant_taxa():

descendants = ncbi.get_descendant_taxa('Viridiplantae', intermediate_nodes=True)

To get the list of taxonIDs, and afterwards I downloaded the names.dmp file (which contain the synonyms) from NCBI then extracted the the information needed from it.

I don't know but I felt like I'm doing a repetitive job here, since ete3 downloads the dump files and stores them in sqlite database. But I was forced to follow this approach because when I looked in the database I didn't find all synonyms. For instance if we take Triticum aestivum, it has the following synonyms:

Scientific name: Triticum aestivum L.
Genbank common name: bread wheat
Synonym: Triticum aestivum subsp. aestivum
         Triticum vulgare L.
Common name: Canadian hard winter wheat
             Common wheat
             Wheat

My question is, is there any possibility to add all this information while creating the database, for instance, if we used

ncbi.get_common_names([4565])

We can get:

{4565:  ["Canadian hard winter wheat", "Common wheat", "Wheat"]}

And the same thing for synonyms, common names ?

Thank you !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions