Skip to content

v1.0.0-beta

Choose a tag to compare

@FjodorvRijsselberg FjodorvRijsselberg released this 30 Jan 15:34
· 419 commits to main since this release
16bdd08

Beta v1.0.0

DataverseNL workflow

The DataverseNL workflow can used to ingest metadata from the DataverseNL dataverse instance to a Dataverse instance. The xml metadata is harvested using OAI PMH as oai_dc (dublin core). The harvested metadata is ingested using a prefect workflow. Every Subverse in DataverseNL that contains Social Science data has its own entry point workflow. All subverses use the same DataverseNL workflow for the actual ingestion of the metadata of the datasets.

The data is transformed to JSON, the ID is obtained and is used to fetch the Dataverse JSON metadata. This metadata is then cleaned and imported into Dataverse using. Finally, the publication date is updated and the dataset is published. All the tasks use external services except for the cleaning step.

File management

The entry workflows for the data providers to start the ingestion process have been put into a specific directory. The dataset ingestion workflows have also been added to a specific folder. both live under the flows directory in scripts.

Workflow versioning

A URL to the workflow version dictionary of a specific workflow is added to the metadata of the ingested metadata. The URL is added to a field in the provenance metadata block. The function that creates the dictionary is called in the entry point workflow. You can specify what services are used by a workflow. For every service you will get a dictionary that contains the latest GitHub release, the latest docker image tag, the service version, and its endpoint.