v1.0.0-beta
Beta v1.0.0
DataverseNL workflow
The DataverseNL workflow can used to ingest metadata from the DataverseNL dataverse instance to a Dataverse instance. The xml metadata is harvested using OAI PMH as oai_dc (dublin core). The harvested metadata is ingested using a prefect workflow. Every Subverse in DataverseNL that contains Social Science data has its own entry point workflow. All subverses use the same DataverseNL workflow for the actual ingestion of the metadata of the datasets.
The data is transformed to JSON, the ID is obtained and is used to fetch the Dataverse JSON metadata. This metadata is then cleaned and imported into Dataverse using. Finally, the publication date is updated and the dataset is published. All the tasks use external services except for the cleaning step.
File management
The entry workflows for the data providers to start the ingestion process have been put into a specific directory. The dataset ingestion workflows have also been added to a specific folder. both live under the flows directory in scripts.
Workflow versioning
A URL to the workflow version dictionary of a specific workflow is added to the metadata of the ingested metadata. The URL is added to a field in the provenance metadata block. The function that creates the dictionary is called in the entry point workflow. You can specify what services are used by a workflow. For every service you will get a dictionary that contains the latest GitHub release, the latest docker image tag, the service version, and its endpoint.