Skip to content

Profile Extension: 'ExactMatchPrimarySourceMultiAcceptedTaxonomicMatch' tiebreaker #17

@Dahlializi

Description

@Dahlializi

Labels: enhancement

Exploring resolutions with FAILED_FORCED_INPUT status in resolved_taxa/taxonopy-v0.1/source=eol/part-00000-34c55989-4190-4247-86af-fac6c8b665bb-c000.snappy.resolved.parquet

One of the most significant failed reasons is "Tie between # results with equal taxonomic matches", failed by exact_match_primary_source_multi_accepted_taxonomic_match.py

Example:

Current resolution:

{
  "uuid": "481396b6-9b8f-4aff-98f7-305fddfb56f3",
  "scientific_name": "Cupidopsis jobates (Hopffer, 1855)",
  "common_name": "",
  "source_dataset": "eol",
  "source_id": "20451407",
  "resolution_status": "FAILED_FORCED_INPUT",
  "kingdom": "Metazoa",
  "phylum": "Arthropoda",
  "class": "Insecta",
  "order": "Lepidoptera",
  "family": "Lycaenidae",
  "genus": "Pterygota",
  "species": "Cupidopsis jobates (Hopffer, 1855)",
  "resolution_path": "RESOLVED",
  "resolution_strategy": "ForceFailedToInput",
  "final_query_term": "Cupidopsis jobates (Hopffer, 1855)",
  "final_query_rank": "species",
  "final_data_source_id": 11,
  "meta_matched_current_name": null,
  "meta_matched_result_id": null,
  "meta_matched_full_name": null,
  "meta_author_disambiguation": null,
  "meta_accepted_record_id": null,
  "meta_synonym_matched": null,
  "meta_accepted_name": null,
  "meta_fuzzy_matched_name": null,
  "meta_edit_distance": null
}

Trace Entry:

{
  "entry": {
    "uuid": "481396b6-9b8f-4aff-98f7-305fddfb56f3",
    "scientific_name": "Cupidopsis jobates (Hopffer, 1855)",
    "common_name": "",
    "kingdom": "Metazoa",
    "phylum": "Arthropoda",
    "class_": "Insecta",
    "order": "Lepidoptera",
    "family": "Lycaenidae",
    "genus": "Pterygota",
    "species": "Cupidopsis jobates (Hopffer, 1855)",
    "source_dataset": "eol",
    "source_id": "20451407"
  },
  "group": {
    "entry_uuids": [
      "2c46684b-496b-4d64-b487-4251ebdbe310",
      "aa1a82a2-2d50-4681-b7e3-3768c674799e",
      "64f7a51d-a0c9-4f34-8b24-1ea6d1e3fab1",
      "083599e4-414e-4f5a-97f0-f41ed97e0002",
      "60abecc6-b094-41b4-a186-3d9a44f29f37",
      "1e67550c-f952-4cdc-a24c-4faf655f9f61",
      "43242bc9-074d-4615-b15e-ea9840371518",
      "481396b6-9b8f-4aff-98f7-305fddfb56f3",
      "56e23a31-b5b3-474f-9e8f-801c44a7948d",
      "43a67fdd-261d-42c9-b6d6-d4fd3ec9a023"
    ],
    "kingdom": "Metazoa",
    "phylum": "Arthropoda",
    "class_": "Insecta",
    "order": "Lepidoptera",
    "family": "Lycaenidae",
    "genus": "Pterygota",
    "species": "Cupidopsis jobates (Hopffer, 1855)",
    "scientific_name": "Cupidopsis jobates (Hopffer, 1855)",
    "common_names": [],
    "key": "b23906440604c734648a630620dff46874353e48f5e4723a9f542dc8989f657b",
    "group_count": 10
  },
  "query_plan": {
    "term": "Cupidopsis jobates (Hopffer, 1855)",
    "rank": "species",
    "source_id": 11
  },
  "resolution_attempts": [
    {
      "key": "4391cf2052ad6c22c8f18db002d45191e37b9a11d59d54b2695de16e7576eba8",
      "entry_group_key": "b23906440604c734648a630620dff46874353e48f5e4723a9f542dc8989f657b",
      "query_term": "Cupidopsis jobates (Hopffer, 1855)",
      "query_rank": "species",
      "data_source_id": 11,
      "status": "FAILED",
      "is_successful": false,
      "is_retry": false,
      "previous_key": null,
      "resolution_strategy_name": "ExactMatchPrimarySourceMultiAcceptedTaxonomicMatch",
      "failure_reason": "Tie between 2 results with equal taxonomic matches",
      "resolved_classification": null,
      "error": null,
      "metadata": {
        "match_count": 5,
        "total_results": 2,
        "tied_results_count": 2,
        "tied_record_ids": [
          "1929059",
          "9894815"
        ],
        "selection_method": "taxonomic_hierarchy_match_tie"
      }
    }
  ]
}

This profile strategy currently simply fails when it encounters ambiguities like the one shown here.

However, the match terms are not exactly the same; there are slight differences (e.g. a single-letter misspelling, including or omitting an author/year suffix, resolved at higher ranks).

Gnverifier Verification:

{
  "id": "af6739ea-1c38-5677-8dc9-125c112b1a9c",
  "name": "Cupidopsis jobates (Hopffer, 1855)",
  "cardinality": 2,
  "matchType": "Exact",
  "results": [
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "1929059",
      "outlink": "https://gbif.org/species/1929059",
      "entryDate": "2024-01-11",
      "sortScore": 9.427395485947873,
      "matchedNameID": "af6739ea-1c38-5677-8dc9-125c112b1a9c",
      "matchedName": "Cupidopsis jobates (Hopffer, 1855)",
      "matchedCardinality": 2,
      "matchedCanonicalSimple": "Cupidopsis jobates",
      "matchedCanonicalFull": "Cupidopsis jobates",
      "currentRecordId": "1929059",
      "currentNameId": "af6739ea-1c38-5677-8dc9-125c112b1a9c",
      "currentName": "Cupidopsis jobates (Hopffer, 1855)",
      "currentCardinality": 2,
      "currentCanonicalSimple": "Cupidopsis jobates",
      "currentCanonicalFull": "Cupidopsis jobates",
      "taxonomicStatus": "Accepted",
      "isSynonym": false,
      "classificationPath": "Animalia|Arthropoda|Insecta|Lepidoptera|Lycaenidae|Cupidopsis|Cupidopsis jobates",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "1|54|216|797|5473|1929057|1929059",
      "editDistance": 0,
      "stemEditDistance": 0,
      "matchType": "Exact",
      "scoreDetails": {
        "cardinalityScore": 1,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 1,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 1,
        "acceptedNameScore": 1,
        "parsingQualityScore": 1
      }
    },
    {
      "dataSourceId": 11,
      "dataSourceTitleShort": "GBIF Backbone Taxonomy",
      "curation": "AutoCurated",
      "recordId": "9894815",
      "outlink": "https://gbif.org/species/9894815",
      "entryDate": "2024-01-11",
      "sortScore": 9.387489602933003,
      "matchedNameID": "95cc66ca-a6a8-5d90-8e33-834402e866b4",
      "matchedName": "Cupidopsis iobates",
      "matchedCardinality": 2,
      "matchedCanonicalSimple": "Cupidopsis iobates",
      "matchedCanonicalFull": "Cupidopsis iobates",
      "currentRecordId": "9894815",
      "currentNameId": "95cc66ca-a6a8-5d90-8e33-834402e866b4",
      "currentName": "Cupidopsis iobates",
      "currentCardinality": 2,
      "currentCanonicalSimple": "Cupidopsis iobates",
      "currentCanonicalFull": "Cupidopsis iobates",
      "taxonomicStatus": "Accepted",
      "isSynonym": false,
      "classificationPath": "Animalia|Arthropoda|Insecta|Lepidoptera|Lycaenidae|Cupidopsis|Cupidopsis iobates",
      "classificationRanks": "kingdom|phylum|class|order|family|genus|species",
      "classificationIds": "1|54|216|797|5473|1929057|9894815",
      "editDistance": 1,
      "stemEditDistance": 0,
      "matchType": "Fuzzy",
      "scoreDetails": {
        "cardinalityScore": 1,
        "infraSpecificRankScore": 0,
        "fuzzyLessScore": 0.6666667,
        "curatedDataScore": 0.33333334,
        "authorMatchScore": 0.14285715,
        "acceptedNameScore": 1,
        "parsingQualityScore": 1
      }
    }
  ],
  "curation": "AutoCurated"
}

But the profile defines the tie strategy that if there are multiple ‘best matches’ (taxonomic matches with the highest score that have the most specific ranks, from species to fianl_query_rank), they are tied and fail by it. According to step 6 and step 7 in exact_match_primary_source_multi_accepted_taxonomic_match.py

Consider adding a separate tiebreaker when a tie is detected, to find the best match among those slight differences, but be careful not to be too lax in letting things through.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions