This project is based on the automatic data quality assessment of Linked Open Data repositories made available for the public by Cultural Heritage institutions. This work is part of the article "An automatic data quality approach to assess semantic data from cultural heritage institutions".
Candela, G. (2023). An automatic data quality approach to assess semantic data from cultural heritage institutions. Journal of the Association for Information Science and Technology, 74(7), 866–878. https://doi.org/10.1002/asi.24761
This work explores how the quality of Linked Open Data made available by Cultural Heritage institutions can be automatically assessed. The results obtained can be useful for other institutions who wish to publish and assess their collections. This work is based on the tool sheXer to mine an RDF dataset to automatically generate the Shape Expressions.
The following links provide a set of Shape Expression examples based on relevant Cultural Heritage LOD datasets that can be assessed using the ShEx2 Simple Online Validator tool. These Shape Expressions have been automatically generated using the public SPARQL endpoints.
A reproducible Jupyter Notebook is provided for the Austrian National Library dataset in order to show how to query the data based on the Europeana Data Model vocabulary.
This SPARQL sentence was used to retrieve examples of events including actors and places based on the CIDOC-CRM vocabulary in the World War I LOD dataset. An integrated Shape Expression was created including the three elements as an illustrative example of how the value of a constraint can be defined by another shape. See the ShEx documentation for additional details.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX cidoc-crm: <http://www.cidoc-crm.org/cidoc-crm/>
SELECT ?s ?actor ?place WHERE {
?s a cidoc-crm:E5_Event .
?s cidoc-crm:P14_carried_out_by ?actor .
?actor a cidoc-crm:E39_Actor .
?s cidoc-crm:P7_took_place_at ?place .
?place a cidoc-crm:E53_Place .
}
LIMIT 10
- Gustavo Candela: Towards a semantic approach in GLAM Labs: the case of the Data Foundry at the National Library of Scotland. CoRR abs/2301.11182 (2023). https://doi.org/10.48550/arXiv.2301.11182
- Daniel Fernandez Alvárez, José Emilio Labra, Daniel Gallo-Avello. Automatic extraction of shapes using sheXer. https://doi.org/10.1016/j.knosys.2021.107975
- Gustavo Candela et al. A Shape Expression approach for assessing the quality of Linked Open Data in libraries. Semantic Web 14(2). 159-179 (2023). https://doi.org/10.3233/SW-210441