Skip to content

General Improvements #27

@alrichardbollans

Description

@alrichardbollans
  • Improve outputs to align with PfH, in particular include 'matched to' (with authors).
  • If Kew sftp is down, package wont load even a local copy of the checklist.
  • Improve uninstall by adding dummy wcvp file
  • Output matched wcvp id
  • include authors of accepted name?.
  • Improve 'matched_by' output:
    • some taxa are not given author information in WCVP (e.g. Caralluma adscendens var. adscendens) and these are being incorrectly given 'direct_match_w_author'
    • Output a column with the match state e.g. 'unique', 'ambiguous' or 'unmatched'
    • Add method to summarise matched_by column of output
  • When first running package, if wcvp download is interuppted the associated zip file is unusable and an error is raised on the next usage. Fix this bby catching errors and redownloading
  • _capitalize_first_letter_of_taxon method raises error when inputting ''
  • Add checks for all input string parameters e.g. catch errors in spelling of family names, taxon ranks etc..
  • Improve handling of different encodings
  • String cleaning:
    • Use OpenRefine string transformers?
    • remove characters like "
    • remove full stops at end of entire string (both in given name and wcvp) -- very common that authors are given with/without full stops which can deter matching
  • Add extra steps prior to autoresolution: (1) using openrefine (2) some sort of fuzzy matching
  • When genus has been found, use algorithm like fuzzywuzzy to match species names. Similar to approach in taxonstand
  • Improve support for common misspellings e.g. y -> ii. OpenRefine improves this but still some issues
  • Input currently has to be pandas dataframe + name of column with names in, but would be useful to allow varied formats e.g. simple list of names
  • Add versioning to distribution lists
  • Add distribution plotting methods
  • Add a genus column to outputs with acc_data['accepted_genus'] = acc_data[wcvp_accepted_columns['name']].apply(get_genus_from_full_name)
  • Allow specifying a directory for wcvp downloads instead of within package, to avoid repeated downloads

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions