-
-
Notifications
You must be signed in to change notification settings - Fork 222
Description
I want to generate a JSON file containing the taxonID, the corresponding scientific name and synonyms (for the sake of simplicity the term synonyms
is equivalent to synonyms
, common names
and Genbank common name
), it will have the following structure:
{
taxonid1 : ["sciName, syn1, syn2.."],
taxonid2 : ["sciName, syn1, syn2.."],
......
}
And I'm willing to do that for a group of descendants (e.g Viridiplantae
) based on their taxonIDs.
To get the desired results, I first used ncbi.get_descendant_taxa()
:
descendants = ncbi.get_descendant_taxa('Viridiplantae', intermediate_nodes=True)
To get the list of taxonIDs, and afterwards I downloaded the names.dmp
file (which contain the synonyms) from NCBI then extracted the the information needed from it.
I don't know but I felt like I'm doing a repetitive job here, since ete3
downloads the dump files and stores them in sqlite
database. But I was forced to follow this approach because when I looked in the database I didn't find all synonyms
. For instance if we take Triticum aestivum, it has the following synonyms:
Scientific name: Triticum aestivum L.
Genbank common name: bread wheat
Synonym: Triticum aestivum subsp. aestivum
Triticum vulgare L.
Common name: Canadian hard winter wheat
Common wheat
Wheat
My question is, is there any possibility to add all this information while creating the database, for instance, if we used
ncbi.get_common_names([4565])
We can get:
{4565: ["Canadian hard winter wheat", "Common wheat", "Wheat"]}
And the same thing for synonyms
, common names
?
Thank you !