-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Labels: enhancement
Exploring resolutions with FAILED_FORCED_INPUT status in resolved_taxa/taxonopy-v0.1/source=eol/part-00000-34c55989-4190-4247-86af-fac6c8b665bb-c000.snappy.resolved.parquet
One of the most significant failed reasons is "Tie between # results with equal taxonomic matches", failed by exact_match_primary_source_multi_accepted_taxonomic_match.py
Example:
Current resolution:
{
"uuid": "481396b6-9b8f-4aff-98f7-305fddfb56f3",
"scientific_name": "Cupidopsis jobates (Hopffer, 1855)",
"common_name": "",
"source_dataset": "eol",
"source_id": "20451407",
"resolution_status": "FAILED_FORCED_INPUT",
"kingdom": "Metazoa",
"phylum": "Arthropoda",
"class": "Insecta",
"order": "Lepidoptera",
"family": "Lycaenidae",
"genus": "Pterygota",
"species": "Cupidopsis jobates (Hopffer, 1855)",
"resolution_path": "RESOLVED",
"resolution_strategy": "ForceFailedToInput",
"final_query_term": "Cupidopsis jobates (Hopffer, 1855)",
"final_query_rank": "species",
"final_data_source_id": 11,
"meta_matched_current_name": null,
"meta_matched_result_id": null,
"meta_matched_full_name": null,
"meta_author_disambiguation": null,
"meta_accepted_record_id": null,
"meta_synonym_matched": null,
"meta_accepted_name": null,
"meta_fuzzy_matched_name": null,
"meta_edit_distance": null
}
Trace Entry:
{
"entry": {
"uuid": "481396b6-9b8f-4aff-98f7-305fddfb56f3",
"scientific_name": "Cupidopsis jobates (Hopffer, 1855)",
"common_name": "",
"kingdom": "Metazoa",
"phylum": "Arthropoda",
"class_": "Insecta",
"order": "Lepidoptera",
"family": "Lycaenidae",
"genus": "Pterygota",
"species": "Cupidopsis jobates (Hopffer, 1855)",
"source_dataset": "eol",
"source_id": "20451407"
},
"group": {
"entry_uuids": [
"2c46684b-496b-4d64-b487-4251ebdbe310",
"aa1a82a2-2d50-4681-b7e3-3768c674799e",
"64f7a51d-a0c9-4f34-8b24-1ea6d1e3fab1",
"083599e4-414e-4f5a-97f0-f41ed97e0002",
"60abecc6-b094-41b4-a186-3d9a44f29f37",
"1e67550c-f952-4cdc-a24c-4faf655f9f61",
"43242bc9-074d-4615-b15e-ea9840371518",
"481396b6-9b8f-4aff-98f7-305fddfb56f3",
"56e23a31-b5b3-474f-9e8f-801c44a7948d",
"43a67fdd-261d-42c9-b6d6-d4fd3ec9a023"
],
"kingdom": "Metazoa",
"phylum": "Arthropoda",
"class_": "Insecta",
"order": "Lepidoptera",
"family": "Lycaenidae",
"genus": "Pterygota",
"species": "Cupidopsis jobates (Hopffer, 1855)",
"scientific_name": "Cupidopsis jobates (Hopffer, 1855)",
"common_names": [],
"key": "b23906440604c734648a630620dff46874353e48f5e4723a9f542dc8989f657b",
"group_count": 10
},
"query_plan": {
"term": "Cupidopsis jobates (Hopffer, 1855)",
"rank": "species",
"source_id": 11
},
"resolution_attempts": [
{
"key": "4391cf2052ad6c22c8f18db002d45191e37b9a11d59d54b2695de16e7576eba8",
"entry_group_key": "b23906440604c734648a630620dff46874353e48f5e4723a9f542dc8989f657b",
"query_term": "Cupidopsis jobates (Hopffer, 1855)",
"query_rank": "species",
"data_source_id": 11,
"status": "FAILED",
"is_successful": false,
"is_retry": false,
"previous_key": null,
"resolution_strategy_name": "ExactMatchPrimarySourceMultiAcceptedTaxonomicMatch",
"failure_reason": "Tie between 2 results with equal taxonomic matches",
"resolved_classification": null,
"error": null,
"metadata": {
"match_count": 5,
"total_results": 2,
"tied_results_count": 2,
"tied_record_ids": [
"1929059",
"9894815"
],
"selection_method": "taxonomic_hierarchy_match_tie"
}
}
]
}
This profile strategy currently simply fails when it encounters ambiguities like the one shown here.
However, the match terms are not exactly the same; there are slight differences (e.g. a single-letter misspelling, including or omitting an author/year suffix, resolved at higher ranks).
Gnverifier Verification:
{
"id": "af6739ea-1c38-5677-8dc9-125c112b1a9c",
"name": "Cupidopsis jobates (Hopffer, 1855)",
"cardinality": 2,
"matchType": "Exact",
"results": [
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "1929059",
"outlink": "https://gbif.org/species/1929059",
"entryDate": "2024-01-11",
"sortScore": 9.427395485947873,
"matchedNameID": "af6739ea-1c38-5677-8dc9-125c112b1a9c",
"matchedName": "Cupidopsis jobates (Hopffer, 1855)",
"matchedCardinality": 2,
"matchedCanonicalSimple": "Cupidopsis jobates",
"matchedCanonicalFull": "Cupidopsis jobates",
"currentRecordId": "1929059",
"currentNameId": "af6739ea-1c38-5677-8dc9-125c112b1a9c",
"currentName": "Cupidopsis jobates (Hopffer, 1855)",
"currentCardinality": 2,
"currentCanonicalSimple": "Cupidopsis jobates",
"currentCanonicalFull": "Cupidopsis jobates",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Animalia|Arthropoda|Insecta|Lepidoptera|Lycaenidae|Cupidopsis|Cupidopsis jobates",
"classificationRanks": "kingdom|phylum|class|order|family|genus|species",
"classificationIds": "1|54|216|797|5473|1929057|1929059",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 1,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 1,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
},
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "9894815",
"outlink": "https://gbif.org/species/9894815",
"entryDate": "2024-01-11",
"sortScore": 9.387489602933003,
"matchedNameID": "95cc66ca-a6a8-5d90-8e33-834402e866b4",
"matchedName": "Cupidopsis iobates",
"matchedCardinality": 2,
"matchedCanonicalSimple": "Cupidopsis iobates",
"matchedCanonicalFull": "Cupidopsis iobates",
"currentRecordId": "9894815",
"currentNameId": "95cc66ca-a6a8-5d90-8e33-834402e866b4",
"currentName": "Cupidopsis iobates",
"currentCardinality": 2,
"currentCanonicalSimple": "Cupidopsis iobates",
"currentCanonicalFull": "Cupidopsis iobates",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Animalia|Arthropoda|Insecta|Lepidoptera|Lycaenidae|Cupidopsis|Cupidopsis iobates",
"classificationRanks": "kingdom|phylum|class|order|family|genus|species",
"classificationIds": "1|54|216|797|5473|1929057|9894815",
"editDistance": 1,
"stemEditDistance": 0,
"matchType": "Fuzzy",
"scoreDetails": {
"cardinalityScore": 1,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 0.6666667,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
}
],
"curation": "AutoCurated"
}
But the profile defines the tie strategy that if there are multiple ‘best matches’ (taxonomic matches with the highest score that have the most specific ranks, from species to fianl_query_rank), they are tied and fail by it. According to step 6 and step 7 in exact_match_primary_source_multi_accepted_taxonomic_match.py
Consider adding a separate tiebreaker when a tie is detected, to find the best match among those slight differences, but be careful not to be too lax in letting things through.