Skip to content

Reformulate as TreeOfLife-toolbox #20

@egrace479

Description

@egrace479

Specifically presented as taking the output of distributed-downloader and then does all these things to create the TreeOfLife dataset.

  • Rename repo: TreeOfLife-toolbox
  • Rename under src: TOL-toolbox
  • Reorganize under TOL-toolbox to have submodules based on function (e.g., filtering, data-transfer) with sub-submodules (e.g., research-filtering)
  • Add from TreeOfLife-dev as submodules in src/TOL-toolbox:
    • webdataset creation
    • metadata transfer
    • lookup table creator
  • Add from TreeOfLife-dev at root-level scripts/ directory all lookup table exclude creators (many of these are here):
    • PDQ hash process
    • Image type separator (museum, camera trap, and citizen science categories)
    • Museum Specimen processing: label vs specimen filter
    • camera trap processing: megadetector & reducer
    • citizen science image processing
    • face detection
  • Add from TreeOfLife-dev at root-level notebooks/ directory all lookup table exclude creators:
    • clustering and other EDA determinations
  • Add root-level data/ folder with embeddings for museum support sets (relates to Add processing tools for exclusion sets #22).

This will be done following the merge of PRs #1 through #15.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    designUX or presentation needs attentiondocumentationImprovements or additions to documentationstructureRefactoring or architecture, general code organization

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions