-
Notifications
You must be signed in to change notification settings - Fork 2
Examples
Wiki ▸ Examples
Let's assume that we must build a spreadsheet table of unemployment rate by country (columns) and year (rows).
You'll need to find the Eurostat dataset id. Go to
http://ec.europa.eu/eurostat/data/database
and then
Tables on EU policy
> Employment performance monitor - indicators (tesem)
> Unemployment rate (tesem120)
Connect to the JSON-stat Eurostat API to retrieve dataset tesem120:
http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1
You can view the contents of the dataset at
To download the dataset from the command line using Eurostat's API run cURL:
curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 -o unr.jsonstat
JSON-stat keeps data and metadata completely apart because it is designed to be an efficient transport format. That probably makes it human-unfriendly. Fortunately, we can use jsonstat-conv to convert JSON-stat to other formats.
If you don't have jsonstat-conv installed in your system or have a version that is lower than 0.5.4, please (re)install it:
npm install -g jsonstat-conv
To check your version of jsonstat-conv:
jsonstat2csv --version
jsonstat-conv includes a command to translate JSON-stat directly into CSV (jsonstat2csv). For example,
jsonstat2csv unr.jsonstat unr-comma.csv
will produce a comma-delimited CSV using dot as the decimal mark (unr-comma.csv).
If you need a semicolon-delimited CSV with comma as the decimal mark, use instead:
jsonstat2csv unr.jsonstat unr-semi.csv --column ";"
jsonstat2csv comes with several options that allow us to customize the output. For example, to include status information and rename the "Status" and "Value" columns to "symbol" and "rate":
jsonstat2csv unr.jsonstat unr-status.csv --status --vlabel rate --slabel symbol
jsonstat2csv, though, does not include options to change the structure of the data. These features are available only in jsonstat2arrobj which is the main way to export JSON-stat to other formats. In particular, jsonstat2arrobj converts JSON-stat into an array of objects (arrobj, from now on), which is probably the most popular structure used to exchange data in JSON. As a consequence, there are many tools to transform arrobjs into other JSON flavors or even to convert them to CSVs.
So instead of converting Eurostat's JSON-stat directly to CSV using jsonstat2csv we are going to translate it to an arrobj using jsonstat2arrobj and then convert it to CSV using open source tools.
To convert JSON-stat into an array of objects:
jsonstat2arrobj unr.jsonstat unr.json
As you probably noticed, the unr.json does not have the structure we need in our spreadsheet (country in columns and year in rows). By default, jsonstat2arrobj stores the value for each combination of categories in a property called value:
[
{
"unit": "Percentage of active population",
"sex": "Total",
"age": "Total",
"time": "2005",
"geo": "Austria",
"value": 5.6
},
...
]
Instead, we need to have a property for each category of the geo dimension (where the country information is) to store their values for each combination of the categories of the rest of the dimensions:
[
{
"unit": "Percentage of active population",
"sex": "Total",
"age": "Total",
"time": "2005",
"AT": 5.6,
"BE": 8.5,
"BG": 10.1,
"CY": 5.3,
...
},
...
]
That is, we need to transpose the values by geo:
jsonstat2arrobj unr.jsonstat unr-transp.json --by geo
Dataset tesem120 contains several single-category dimensions: sex and age are always "Total" and unit is always "Percentage of active population". We can remove them from the transposed JSON:
jsonstat2arrobj unr.jsonstat unr-drop.json --by geo --drop sex,age,unit
For this task, there are many tools out there. We will be using d3-dsv, which includes a json2csv command.
npm install -g d3-dsv
To convert our last JSON to CSV:
json2csv < unr-drop.json > unr.csv
The resulting CSV has comma as the column delimiter and dot as the decimal mark. If what you need is a CSV with comma as the decimal mark, first you must use the jsonstat2arrobj --comma (decimal mark) option:
jsonstat2arrobj unr.jsonstat unr-comma.json --by geo --drop sex,age,unit --comma
And then specify in json2csv a different column delimiter (for example, a semicolon):
json2csv < unr-comma.json > unr-comma.csv -w ";"
All the process has required three lines and three files (unr.jsonstat, unr-drop.json, unr.csv):
curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 -o unr.jsonstat
jsonstat2arrobj unr.jsonstat unr-drop.json --by geo --drop sex,age,unit
json2csv < unr-drop.json > unr.csv
This is not necessary, though: all the process can be done in a single line, piping the output of a program to another program. For that, we need to enable jsonstat2arrobj stream interface (--stream) which will allow us to use pipes (|) and redirects (<, >).
In the stream interface, this command
jsonstat2arrobj unr.jsonstat unr.json
must be rewritten as
jsonstat2arrobj < unr.jsonstat > unr.json --stream
So to get a comma-delimited CSV with dot as the decimal mark in a single line:
curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj --by geo --drop sex,age,unit --stream | json2csv > unr.csv
Or a little shorter:
curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj -b geo -d sex,age,unit -t | json2csv > unr.csv
And to get a semicolon-delimited CSV with comma as the decimal mark:
curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj -b geo -d sex,age,unit -k -t | json2csv > unr-comma.csv -w ";"