-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
We want to have a more intuitive way to produce rocrate-metadata-json files based on a limited information and view the typical user has on their dataset + the actual content - avoiding the need for fiddling with many technical details and syntax
The output of such process (dubbed the sema.rocreator
) will be a locally produced ./rocrate-metadata.json
file
The input would be the combo of
./**
a root folder to the content - also where the resulting json file will be placed./rocreator.yml
providing basic information driving the process
Initial draft of such yml included below - although further in depth discussion is needed
# we assume we can use !resolve trick to inject env variables
# we have code for that already in sema.bench.core --> should become shared commons thingy
# refer to a "strategy / way of working / set of assumptions and feaures" known to the rocreator
uses: emobon-observatory # ALSO: to check if yaml has a built in extension mechanism that allows to merge in some defaults
# main public identifier for the crate -- probably optional as this does not need to end up in the crate itrself?
base: !resolve {domain}/{repo}/ # to resolve to e.g. https://data.emobon.embrc.eu/my-set-crate/
# optionally have same-as refs
sameas:
- doi:4886060.34/whatever
# provide ro-crate specific hooks
ro:
version: 1.2-DRAFT # the version to use
profile: !resolve {domain}/observatory-profile/latest # the profile to comply to -- could be set as part of the "strategy-in-use"
# essential contextual entities
license: https://creativecommons.org/cc-by/4.0 # should probably be the default
# associated people and organisations
agents: # list them all - each key should be a valid @id to use, in best cases even dereference
# every nested key should be orcid url --> alternative identifiers could be oceanexport-id, url, edmoid, rorid, ...
# per agent a uri-identifier should suffice - the rocreator impl could, ( again optionally) fetch associated triples and slide in more
https://orcid.org/.... # First LastName --> note that yml comments can be slided in to give visual clues to remember what this was about
roles: [ owner, contributor, contact ] # selection of roles this agentis playing here
sameas: # space to list possible other identifiers for same agent
key: value # extensible extra range of infos one would want to add
# what should go into this crate
content:
ignorecase: yes | no | tolower #way of handling the case of filenames to find and process?
includes: # list of glob patterns to include -- defaults to **/* if none are provided
- "**/*.csv"
- "**/*tsv"
- "**/*.xml"
excludes: # list of match patterns to exclude (evaluated after include, so are stronger)
- .*ignore # any of gitignore, dockerignore, ...
- .github/ # all workflow stuff
autofill: # content to be added to entities being inserted - a list of match patterns, with nested key-values
- *.csv:
content-type: text/csv; charset=utf8
key: value
no-auto-fill: #disable autofill for certain mathing patterns
- http*
list: # manual list of local and remote entities to be added as well, autifill gets applied to them, plus any additional extra key-value to overwrite
- "./specialfile": # additional individual file
field-name: field-value
- https://whatever:
key: value
Use cases - to provide rocrate-metadata-json for the
- basic content examples
- the emobon-profile crates
- the emobon-observatory crates
- ... others to consider?
Approach:
- target users with only a content view on the dataset, not on any technical details concerning (rocrates nor rdf)
- make things as simple and concise as possible for end-users (sensible defaults)
- design for extensibility -- allow for plugging in specific extra code to cater for other uses cases / specific strategies
Way forward:
- I would like us to think about what to build first
- make a limited number of examples showing - input yml + input folder - and to which expected output it should lead
- actually use those as test cases & documentation
- this and actula dev could materialise in a branch + PR associated to this
See also the yml-resolve code at
Line 147 in 48ef7cb
loader.add_constructor("!resolve", resolver) |
Metadata
Metadata
Assignees
Labels
No labels