Experimental CSV-W representation of data for MBO

Prerequisites

N.B. This requires that the build system has make and docker installed.

Data CSVs

End-users should ignore (or perhaps not see) all of the other files which arecurrently stored in remote. But they're necessary for validating the data and converting it to an RDF representation.

For detailed information about the CSV files and definitions of the columns they contain, see class-descriptions.md.

Before doing anything

Make sure you run the init command so that it can create the right output directories as well as pulling the required docker containers.

$ make init

Validating the data

CSV-W and Manual Foreign Key Constraints

$ make validate
=============================== Pulling latest required docker images. ===============================
...

=============================== Validating remote/dataset-metadata.json ===============================
Valid CSV-W

=============================== Validating remote/eov-metadata.json ===============================
Valid CSV-W

It will (hopefully) tell you if you get something wrong, for instance referencing an EOV which isn't defined.

The SHACL Report

There are some forms of invalid data which can only be detected when looking at the data in its entirety. For instance we check to ensure that an MBO Identifier hasn't been (accidentally) reused in different CSV files; further, we want to generate a report of entities which have been defined but don't seem to be referenced anywhere else in the dataset. These constraints are applied via SHACL constraints (see remote/shacl.ttl). Violations cause the build to fail, warnings do not cause the build to fail.

$ make shacl-report

The SHACL Report:

First looking for any violations:

+----------+
| Conforms |
+----------+
|   True   |
+----------+


Now looking for any warnings or info:

+----------+
| Conforms |
+----------+
|  False   |
+----------+

+-----+----------+---------------------------+-------------+---------------------------+---------------------------+---------------------------+---------------------------+
| No. | Severity | Focus Node                | Result Path | Message                   | Component                 | Shape                     | Value                     |
+-----+----------+---------------------------+-------------+---------------------------+---------------------------+---------------------------+---------------------------+
| 1   | Warning  | https://w3id.org/marco-bo | -           | All entities should be re | SPARQLConstraintComponent | http://w3id.org/marco-bol | MBO Identifier 'mbo_todo_ |
|     |          | lo/mbo_todo_license_4     |             | ferenced somewhere else;  |                           | o/ShaclConstraints/Entit  | license_4' in License.csv |
|     |          |                           |             | this is a warning, it is  |                           | iesShouldBeReferenced     |  doesn't appear to be ref |
|     |          |                           |             | not enforced.             |                           |                           | erenced anywhere else.    |
|     |          |                           |             |                           |                           |                           |                           |
....

Pay attention to the Focus Node field which tells you which entity is the problem, as well as the Message and Value columns which tell you what the problem is.

Generating schema.org JSON-LD representation

$ make
....

Running Everything Locally

Make sure you have installed docker on your machine. Installing Docker Desktop may be the easiest way.
Make sure that you have GNU Make, Bash and jq installed .
Make sure you have installed git on your machine. then clone the repository locally with the following command:

$ git clone https://github.com/marco-bolo/csv-to-json-ld.git

Make the changes to the CSV files inside the csv-to-json-ld folder.
Open a terminal in the repo you have just cloned and run the following:

$ make init check validate shacl-report

This may take a bit of time as it will pull 4 substantially sized docker images from the internet. But it will eventually perform all of the validations (without actually generating the JSON-LD outputs)

Alternatively you can run the whole process which will perform the checks, validation and generate the resulting JSON-LD files in a directory called 'out'.

$ make

Speed build

If you want speedy outputs, have multiple cores at your disposal, and don't mind incoherently timed log outputs then consider running make with a degree of parallelism (p):

$ p=4 && make -j "$p" init && make -j "$p" validate shacl-report jsonld

Files are output in the out directory.

https://w3id.org/marco-bolo/ConvertMboIdToNode

This is an identifier which is used in CSV-W metadata documents and is necessary due to limitations in the CSV on the web standard. The CSV-W standard supports delimited list columns, however only supports the serialisation of these to RDF literals and does not allow them to point to RDF Nodes. As a result, we use https://w3id.org/marco-bolo/ConvertMboIdToNode as the datatype in the CSV-W and later convert all of these literals into resource/node references in the build process. This process also sticks https://w3id.org/marco-bolo/ on front of the value in the column.

N.B. https://w3id.org/marco-bolo/ConvertIriToNode provides a similar function but more generally for IRIs.

https://w3id.org/marco-bolo/InputMetadataDescription

This is an identifier which is used to internally track the parametadata describing who input the metadata about something, when it was done, etc.

https://w3id.org/marco-bolo/isResultOf

An internal MBO predicate which effectively provides an inverse of https://schema.org/result. This allows us to specify the relationship mbo:SomeAction schema:result mbo:SomeParaMetadata. without having to modify create-action.csv or any of the outputs therefrom which would create an unhelpfully complex build dependencies graph. The resulting triple is represented in JSON-LD as a @reverse property.

Only to be used in a triple where the subject is an instance of https://w3id.org/marco-bolo/InputMetadataDescription and the object is an action defined in create-action.csv

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
.github/workflows		.github/workflows
remote		remote
.gitignore		.gitignore
Action.csv		Action.csv
Audience.csv		Audience.csv
ContactPoint.csv		ContactPoint.csv
DataDownload.csv		DataDownload.csv
Dataset.csv		Dataset.csv
DatasetComment.csv		DatasetComment.csv
DefinedTerm.csv		DefinedTerm.csv
EmbargoStatement.csv		EmbargoStatement.csv
GeoShape.csv		GeoShape.csv
HowTo.csv		HowTo.csv
HowToStep.csv		HowToStep.csv
HowToTip.csv		HowToTip.csv
LICENSE		LICENSE
License.csv		License.csv
Makefile		Makefile
MonetaryGrant.csv		MonetaryGrant.csv
Organization.csv		Organization.csv
Person.csv		Person.csv
Place.csv		Place.csv
PropertyValue.csv		PropertyValue.csv
PublishingStatusDefinedTerm.csv		PublishingStatusDefinedTerm.csv
README.md		README.md
Service.csv		Service.csv
SoftwareApplication.csv		SoftwareApplication.csv
SoftwareSourceCode.csv		SoftwareSourceCode.csv
Taxon.csv		Taxon.csv
check_csv_integrity.py		check_csv_integrity.py
class-descriptions.md		class-descriptions.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Experimental CSV-W representation of data for MBO

Prerequisites

Data CSVs

Before doing anything

Validating the data

CSV-W and Manual Foreign Key Constraints

The SHACL Report

Generating schema.org JSON-LD representation

Running Everything Locally

Speed build

https://w3id.org/marco-bolo/ConvertMboIdToNode

https://w3id.org/marco-bolo/InputMetadataDescription

https://w3id.org/marco-bolo/isResultOf

About

Licenses found

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

License

Licenses found

marco-bolo/csv-to-json-ld

Folders and files

Latest commit

History

Repository files navigation

Experimental CSV-W representation of data for MBO

Prerequisites

Data CSVs

Before doing anything

Validating the data

CSV-W and Manual Foreign Key Constraints

The SHACL Report

Generating schema.org JSON-LD representation

Running Everything Locally

Speed build

https://w3id.org/marco-bolo/ConvertMboIdToNode

https://w3id.org/marco-bolo/InputMetadataDescription

https://w3id.org/marco-bolo/isResultOf

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

Packages