new component to build github-based rocrates metadata files

We want to have a more intuitive way to produce rocrate-metadata-json files based on a limited information and view the typical user has on their dataset + the actual content - avoiding the need for fiddling with many technical details and syntax

The output of such process (dubbed the `sema.rocreator`) will be a locally produced `./rocrate-metadata.json` file
The input would be the combo of 
- `./**` a root folder to the content - also where the resulting json file will be placed
- `./rocreator.yml` providing basic information driving the process

Initial draft of such yml included below - although further in depth discussion is needed

```yml
# we assume we can use !resolve trick to inject env variables
# we have code for that already in sema.bench.core --> should become shared commons thingy

#  refer to a "strategy / way of working / set of assumptions and feaures" known to the rocreator
uses: emobon-observatory  # ALSO: to check if yaml has a built in extension mechanism that allows to merge in some defaults

# main public identifier for the crate -- probably optional as this does not need to end up in the crate itrself?
base:  !resolve {domain}/{repo}/  # to resolve to e.g. https://data.emobon.embrc.eu/my-set-crate/

# optionally have same-as refs
sameas:
  - doi:4886060.34/whatever

# provide ro-crate specific hooks
ro:
  version: 1.2-DRAFT    # the version to use 
  profile:  !resolve {domain}/observatory-profile/latest   # the profile to comply to -- could be set as part of the "strategy-in-use"

# essential contextual entities
license: https://creativecommons.org/cc-by/4.0  # should probably be the default


# associated people and organisations
agents:  # list them all - each key should be a valid @id to use, in best cases even dereference
  # every nested key should be orcid url --> alternative identifiers could be  oceanexport-id, url, edmoid, rorid, ...
  # per agent a uri-identifier should suffice - the rocreator impl could, ( again optionally) fetch associated triples and slide in more
  https://orcid.org/....  # First LastName --> note that yml comments can be slided in to give visual clues to remember what this was about
    roles: [ owner, contributor, contact ]   # selection of roles this agentis playing here
    sameas:  # space to list possible other identifiers for same agent
    key: value  # extensible extra range of infos one would want to add

# what should go into this crate
content:
  ignorecase: yes | no | tolower   #way of handling the case of filenames to find and process?
  includes:  # list of glob patterns to include -- defaults to **/* if none are provided
    - "**/*.csv"
    - "**/*tsv"
    - "**/*.xml"
  excludes:  # list of match patterns to exclude (evaluated after include, so are stronger)
    - .*ignore   # any of gitignore, dockerignore, ...
    - .github/    # all workflow stuff
  autofill:   # content to be added to entities being inserted - a list of match patterns, with nested key-values
     - *.csv:
            content-type: text/csv; charset=utf8
            key: value
  no-auto-fill:  #disable autofill for certain mathing patterns
     - http* 
  list:   # manual list of local and remote entities to be added as well, autifill gets applied to them, plus any additional extra key-value to overwrite 
    - "./specialfile":  # additional individual file
          field-name: field-value
    - https://whatever:
        key: value

```


Use cases - to provide rocrate-metadata-json for the 
* basic content examples
* the emobon-profile crates
* the emobon-observatory crates
* ... others to consider?

Approach:
- target users with only a content view on the dataset, not on any technical details concerning (rocrates nor rdf)
- make things as simple and concise as possible for end-users  (sensible defaults)
- design for extensibility -- allow for plugging in specific extra code to cater for other uses cases / specific strategies

Way forward:
- I would like us to think about what to build first 
    - make a limited number of examples showing - input yml + input folder - and to which expected output it should lead
    - actually use those as test cases & documentation
 - this and actula dev could materialise in a branch + PR associated to this

See also the yml-resolve code at https://github.com/vliz-be-opsci/py-sema/blob/48ef7cb8e3a3fde1a4f7ab5cd76a062f76b7c1b9/sema/bench/core.py#L147

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

new component to build github-based rocrates metadata files #142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

new component to build github-based rocrates metadata files #142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions