-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
bugSomething isn't workingSomething isn't workingedge caseA resolution edge case that has either been unaddressed or incorrectly addressed by the workflow.A resolution edge case that has either been unaddressed or incorrectly addressed by the workflow.
Description
In exploring TOL-200M, it was noticed that there are 223,284 entries that have a resolution path of:
kingdom: Plantae
phylum: null
class: null
order: null
family: null
genus: Plantae
species: null
Example entry resolution:
{'uuid': 'e571a2d3-07f3-45fe-b372-67c0594de0c8',
'scientific_name': 'Plantae',
'common_name': 'plants; plants',
'source_dataset': 'gbif',
'source_id': '1038946915',
'resolution_status': 'EXACT_MATCH_PRIMARY_SOURCE_ACCEPTED_AUTHOR_DISAMBIGUATION',
'kingdom': 'Plantae',
'phylum': '',
'class': '',
'order': '',
'family': '',
'genus': 'Plantae',
'species': '',
'resolution_path': 'RESOLVED',
'resolution_strategy': 'ExactMatchPrimarySourceAcceptedAuthorDisambiguation',
'final_query_term': 'Plantae',
'final_query_rank': 'scientific_name',
'final_data_source_id': 11,
'meta_selected_record_id': None,
'meta_candidate_count': None,
'meta_accepted_record_id': None,
'meta_matched_result_id': '11387779',
'meta_matched_full_name': 'Plantae',
'meta_author_disambiguation': 'true',
'resolution_failure_reason': None,
'meta_original_status': None,
'meta_force_failed_to_input': None,
'meta_original_attempt_key': None,
'source_file': 'part-00000-3990aff1-4728-49f9-bd76-087ff566f4fc-c000.snappy.resolved.parquet',
'meta_matched_current_name': None,
'meta_synonym_matched': None,
'meta_accepted_name': None,
'meta_disambiguated_record_id': None}
Corresponding Input:
>>> row_dict
{'source_id': '1038946915', 'uuid': 'e571a2d3-07f3-45fe-b372-67c0594de0c8', 'scientific_name': 'Plantae', 'phylum': None, 'class': None, 'order': None, 'family': None, 'genus': None, 'species': None, 'kingdom': 'Plantae', 'common_name': 'plants; plants'}
(in source_taxa/source=gbif/part-00000-3990aff1-4728-49f9-bd76-087ff566f4fc-c000.snappy.parquet
)
GNVerifier query result for "Plantae"
$ docker run --rm -i gnames/gnverifier:v1.2.5 -j 1 --format compact --capitalize --all_matches --sources 11 "Plantae" | jq
{
"id": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"name": "Plantae",
"cardinality": 1,
"matchType": "Exact",
"results": [
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "11387779",
"outlink": "https://gbif.org/species/11387779",
"entryDate": "2024-01-11",
"sortScore": 9.410739851747246,
"matchedNameID": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"matchedName": "Plantae",
"matchedCardinality": 1,
"matchedCanonicalSimple": "Plantae",
"matchedCanonicalFull": "Plantae",
"currentRecordId": "11387779",
"currentNameId": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"currentName": "Plantae",
"currentCardinality": 1,
"currentCanonicalSimple": "Plantae",
"currentCanonicalFull": "Plantae",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Plantae|Plantae",
"classificationRanks": "kingdom|genus",
"classificationIds": "6|11387779",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 1,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
},
{
"dataSourceId": 11,
"dataSourceTitleShort": "GBIF Backbone Taxonomy",
"curation": "AutoCurated",
"recordId": "6",
"outlink": "https://gbif.org/species/6",
"entryDate": "2024-01-11",
"sortScore": 9.410739851747246,
"matchedNameID": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"matchedName": "Plantae",
"matchedCardinality": 1,
"matchedCanonicalSimple": "Plantae",
"matchedCanonicalFull": "Plantae",
"currentRecordId": "6",
"currentNameId": "827f5f3d-f332-5d4e-9ec9-6dbf1b07bdd9",
"currentName": "Plantae",
"currentCardinality": 1,
"currentCanonicalSimple": "Plantae",
"currentCanonicalFull": "Plantae",
"taxonomicStatus": "Accepted",
"isSynonym": false,
"classificationPath": "Plantae",
"classificationRanks": "kingdom",
"classificationIds": "6",
"editDistance": 0,
"stemEditDistance": 0,
"matchType": "Exact",
"scoreDetails": {
"cardinalityScore": 1,
"infraSpecificRankScore": 0,
"fuzzyLessScore": 1,
"curatedDataScore": 0.33333334,
"authorMatchScore": 0.14285715,
"acceptedNameScore": 1,
"parsingQualityScore": 1
}
}
],
"curation": "AutoCurated"
}
The exact_match_primary_source_accepted_author_disambiguation.py
profile incorrectly matches this case (suspect: step 5 not specific enough).
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingedge caseA resolution edge case that has either been unaddressed or incorrectly addressed by the workflow.A resolution edge case that has either been unaddressed or incorrectly addressed by the workflow.