SH2RDF

The purpose of the script csv2json.py is to parse csv table-based input into JSON-LD, which is linked with schema.org schemas.

This is achieved by mapping the information given in each input row to fields in an empty pre-defined json hierarchy (initial.json). Details are explained below.

In the end, a pre-defined context (context.json) is appended to the json to embed the information into the schema.org structure to the extent that this is possible. The result is a JSON-LD file (Klassiker.json...100.json) for each input table (Klassiker.csv...100.csv).

Usage

python csv2json.py ${inputfile} ${outputfile}

The script parse_all.sh can be used to transform all input files at once. Put your input files into the input folder - output files will be written to output. Some example files are provided.

In order to make use of geodata, you need to have a local SQLite database based on https://www.geonames.org/ - provide the path to this database in resources/allCountries.db. This database is not necessary for conversion, if you don't have it, simply comment the respective lines in the script.

Implementation Details

We are assuming the input data as column based input format.

As a row in a table-based dataset does not contain lists, each row constitutes a minimal piece of information.

We add a rowID column to the input data.

Here, each row contains information about one occupation_item for a certain person.

We define a preset structure for the output format and a context definition to map it to schema.org.

The preset structure only needs to hold the type information for objects/nodes.

For each column we define as a dictionary with empty fields, which information can be found in this column and where to put it in the overall structure, i.e.:

{'smart_harvesting':{'':{'occupations':{'':{'institution_l2':{'@id':None,'location':None,'name':None}}}}}}

Then the parsing function determines how to find the values to fit into these fields.

The resulting instantiated object is 'added' onto the overall structure by means of a hierarchical update function merge().

For this reason, we do not use lists containing objects/nodes, but only for primitive datatypes like strings.

As each object needs an id value anyway, we can also use this as a key in the respective parent dictionary, but at first, the keys are instatiated as "", as the id is only known after parsing the row.

Once we have read a line, we replace the empty "" key in the current row's representation with the value in the respective @id field that was filled during parsing.

After each row, we insert the final keys and merge the structure for this row with the global final representation object under construction.

We repeat until all input rows are processed and merge the result with the context definition to obtain the JSON-LD output.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
input		input
output		output
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
context.json		context.json
csv2json.py		csv2json.py
draft.json		draft.json
initial.json		initial.json
parse_all.sh		parse_all.sh
person.json		person.json
previous.json		previous.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SH2RDF

Usage

Implementation Details

About

Uh oh!

Releases

Packages

Languages

License

Smart-Harvesting/SH2RDF

Folders and files

Latest commit

History

Repository files navigation

SH2RDF

Usage

Implementation Details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages