Skip to content

new component to build github-based rocrates metadata files #142

@mpo-vliz

Description

@mpo-vliz

We want to have a more intuitive way to produce rocrate-metadata-json files based on a limited information and view the typical user has on their dataset + the actual content - avoiding the need for fiddling with many technical details and syntax

The output of such process (dubbed the sema.rocreator) will be a locally produced ./rocrate-metadata.json file
The input would be the combo of

  • ./** a root folder to the content - also where the resulting json file will be placed
  • ./rocreator.yml providing basic information driving the process

Initial draft of such yml included below - although further in depth discussion is needed

# we assume we can use !resolve trick to inject env variables
# we have code for that already in sema.bench.core --> should become shared commons thingy

#  refer to a "strategy / way of working / set of assumptions and feaures" known to the rocreator
uses: emobon-observatory  # ALSO: to check if yaml has a built in extension mechanism that allows to merge in some defaults

# main public identifier for the crate -- probably optional as this does not need to end up in the crate itrself?
base:  !resolve {domain}/{repo}/  # to resolve to e.g. https://data.emobon.embrc.eu/my-set-crate/

# optionally have same-as refs
sameas:
  - doi:4886060.34/whatever

# provide ro-crate specific hooks
ro:
  version: 1.2-DRAFT    # the version to use 
  profile:  !resolve {domain}/observatory-profile/latest   # the profile to comply to -- could be set as part of the "strategy-in-use"

# essential contextual entities
license: https://creativecommons.org/cc-by/4.0  # should probably be the default


# associated people and organisations
agents:  # list them all - each key should be a valid @id to use, in best cases even dereference
  # every nested key should be orcid url --> alternative identifiers could be  oceanexport-id, url, edmoid, rorid, ...
  # per agent a uri-identifier should suffice - the rocreator impl could, ( again optionally) fetch associated triples and slide in more
  https://orcid.org/....  # First LastName --> note that yml comments can be slided in to give visual clues to remember what this was about
    roles: [ owner, contributor, contact ]   # selection of roles this agentis playing here
    sameas:  # space to list possible other identifiers for same agent
    key: value  # extensible extra range of infos one would want to add

# what should go into this crate
content:
  ignorecase: yes | no | tolower   #way of handling the case of filenames to find and process?
  includes:  # list of glob patterns to include -- defaults to **/* if none are provided
    - "**/*.csv"
    - "**/*tsv"
    - "**/*.xml"
  excludes:  # list of match patterns to exclude (evaluated after include, so are stronger)
    - .*ignore   # any of gitignore, dockerignore, ...
    - .github/    # all workflow stuff
  autofill:   # content to be added to entities being inserted - a list of match patterns, with nested key-values
     - *.csv:
            content-type: text/csv; charset=utf8
            key: value
  no-auto-fill:  #disable autofill for certain mathing patterns
     - http* 
  list:   # manual list of local and remote entities to be added as well, autifill gets applied to them, plus any additional extra key-value to overwrite 
    - "./specialfile":  # additional individual file
          field-name: field-value
    - https://whatever:
        key: value

Use cases - to provide rocrate-metadata-json for the

  • basic content examples
  • the emobon-profile crates
  • the emobon-observatory crates
  • ... others to consider?

Approach:

  • target users with only a content view on the dataset, not on any technical details concerning (rocrates nor rdf)
  • make things as simple and concise as possible for end-users (sensible defaults)
  • design for extensibility -- allow for plugging in specific extra code to cater for other uses cases / specific strategies

Way forward:

  • I would like us to think about what to build first
    • make a limited number of examples showing - input yml + input folder - and to which expected output it should lead
    • actually use those as test cases & documentation
  • this and actula dev could materialise in a branch + PR associated to this

See also the yml-resolve code at

loader.add_constructor("!resolve", resolver)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions