Skip to content

alrichardbollans/WCVPHomonyms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Brief Analysis of Ambiguous Homonyms

What is counted as a homonym?

This analysis considers validly published binomial species names (excluding hybrids) in the World Checklist of Vascular Plants (WCVP) v13 [1] that resolve to an accepted name (i.e. Unplaced names are ignored). I consider duplicated binomial names as homonyms, and specifically explore ambiguous homonyms (i.e. those that resolve to different accepted species). For example, Abies grandis is a homonym, as it may refer to Abies grandis (Douglas ex D.Don) Lindl. or Abies grandis Hook. In this case, when the taxonomic authority is not specified, this is also ambiguous in the sense that these two records may refer to different accepted species -- Abies grandis (Douglas ex D.Don) or Abies amabilis Douglas ex J.Forbes.

Overview

Out of the 964,967 species records in the WCVP, 57,928 of these are species binomials that are ambiguous homonyms. The breakdown of the taxon statuses of these 57,928 records is given below.

ambiguous_homonyms_taxon_status_pie_chart.png

Out of 930,686 unique binomial species names in the WCVP, 26,670 of these are ambiguous homonyms.

The most common homonym is Artemisia rupestris, and this is also the binomial name that can refer to the most different accepted species, possibly referring to:

  • Artemisia alba subsp. alba
  • Artemisia atrata Lam.
  • Artemisia granatensis Boiss.
  • Artemisia norvegica subsp. norvegica
  • Artemisia assoana Willk.
  • Artemisia rupestris L.
  • Artemisia splendens Willd.
  • Artemisia umbelliformis Lam.

When and Where do Ambiguous Homonyms Come From?

The graph below shows published names over time and the proportions of which are homonyms (note that a name may have not been homonymous at publication but falls into homonymy due to a later publication). It is clear to see that over time, fewer homonyms are being published! This is somewhat unsurprising due to the connectivity given by the internet, as well as better nomenclature standards and name databases like IPNI.

WCVP Species Publications and Homonym Occurrence_normalized.jpg

The chart below shows the global distributions of the accepted species that are resolved to by ambiguous binomial homonyms.

ambiguous_homonyms_dists.jpg

This partly reflects the global distribution of plant species (e.g. dense population in South America --- see WCVP species plot).

To assess which regions may be overrepresented, we fit a LOWESS regression model, which is robust to outliers, on the number of accepted species in a region against the number of accepted species that are resolved to by ambiguous binomial homonyms:

outliers.jpg

Outliers are highlighted where the residuals are greater than 2 standard deviations from the mean. To visualise the global distribution of these residuals:

residuals_distributions.jpg

We mainly see a concentration in Central and Southern Europe. Further analysis is needed to understand why these regions are overrepresented.

It is plausible that the greater ambiguity of names in these regions could affect statistical analyses when names need to be resolved, though this requires further investigation.

Discussion

This analysis highlights one specific aspect of ambiguity with regards to plant nomenclature. Though the underlying cause of this ambiguity is improving over time, the ambiguity is persistent and requires disambiguation by people reading scientific papers, datasets, herbarium sheets, blog posts etc.. Increasingly this disambiguation, or name resolution, also needs to be carried out by automated systems that are attempting to extract structured data from these types of sources. There are a plethora of software packages for this job (see [2] and here for some examples), however when faced with ambiguous homonyms, automated systems must make choices on which name to resolve to. A human resolver may overcome this ambiguity by searching the context surrounding a given name, e.g. what does a paper reference? when was it published? does the paper mention a particular taxonomic authority or dataset version? This kind of context is not available to standard resolution methods. In either case, the most reliable way to overcome this ambiguity is to include taxonomic authorities in plant names.

While the use of taxonomic authorities is standard practice in published work related to taxonomy and nomenclature, this appears to be much less common for other kinds of research and, as far as I understand, is actually discouraged by some journals. Personally, I would strongly encourage the use of full scientific names (i.e. including taxonomic authority) anywhere that 'scientific' names are used so that the related documents contain persistent, unambiguous references.

References

[1] Rafaël Govaerts et al., ‘The World Checklist of Vascular Plants, a Continuously Updated Resource for Exploring Global Plant Diversity’, Scientific Data 8, no. 1 (2021): 1–10, https://doi.org/10.1038/s41597-021-00997-6.

[2] Matthias Grenié et al., ‘Harmonizing Taxon Names in Biodiversity Data: A Review of Tools, Databases and Best Practices’, Methods in Ecology and Evolution, 18 February 2022, 2041-210X.13802, https://doi.org/10.1111/2041-210X.13802.

Licence

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0

About

A brief analysis of ambiguous species homonyms

Topics

Resources

License

Stars

Watchers

Forks

Languages