Skip to content

Examples

Xavier Badosa edited this page Jan 28, 2017 · 37 revisions

WikiExamples

Let's assume that we must build a spreadsheet table of unemployment rate by country (columns) and year (rows).

Sample spreadsheet

Steps

1. Retrieve the unemployment rate by country in recent years from Eurostat

You'll need to find the Eurostat dataset id. Go to

http://ec.europa.eu/eurostat/data/database

and then

Tables on EU policy
  > Employment performance monitor - indicators (tesem)
    > Unemployment rate (tesem120)

Connect to the JSON-stat Eurostat API to retrieve dataset tesem120:

http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1

You can view the contents of the dataset at

http://json-stat.com/explorer/#/http%3A%2F%2Fec.europa.eu%2Feurostat%2Fwdds%2Frest%2Fdata%2Fv2.1%2Fjson%2Fen%2Ftesem120%3Fprecision%3D1

To download the dataset from the command line using Eurostat's API run cURL:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 -o unr.jsonstat

JSON-stat keeps data and metadata completely apart because it is designed to be an efficient transport format. That probably makes it human-unfriendly. Fortunately, we can use jsonstat-conv to convert JSON-stat to other formats.

2. Convert JSON-stat to a more popular JSON data structure

If you don't have jsonstat-conv installed in your system or have a version that is lower than 0.5.4, please (re)install it:

npm install -g jsonstat-conv

To check your version of jsonstat-conv:

jsonstat2csv --version

jsonstat-conv includes a command to translate JSON-stat directly into CSV (jsonstat2csv). For example,

jsonstat2csv unr.jsonstat unr-comma.csv

will produce a comma-delimited CSV using dot as the decimal mark (unr-comma.csv).

If you need a semicolon-delimited CSV with comma as the decimal mark, use instead:

jsonstat2csv unr.jsonstat unr-semi.csv --column ";"

jsonstat2csv comes with several options that allow us to customize the output. For example, to include status information and rename the "Status" and "Value" columns to "symbol" and "rate":

jsonstat2csv unr.jsonstat unr-status.csv --status --vlabel rate --slabel symbol

jsonstat2csv, though, does not include options to change the structure of the data. These features are available only in jsonstat2arrobj which is the main way to export JSON-stat to other formats. In particular, jsonstat2arrobj converts JSON-stat into an array of objects (arrobj, from now on), which is probably the most popular structure used to exchange data in JSON. As a consequence, there are many tools to transform arrobjs into other JSON flavors or even to convert them to CSVs.

So instead of converting Eurostat's JSON-stat directly to CSV using jsonstat2csv we are going to translate it to an arrobj using jsonstat2arrobj and then convert it to CSV using open source tools.

To convert JSON-stat into an array of objects:

jsonstat2arrobj unr.jsonstat unr.json

3. Customize the array of objects

As you probably noticed, the unr.json does not have the structure we need in our spreadsheet (country in columns and year in rows). By default, jsonstat2arrobj stores the value for each combination of categories in a property called value:

[
  {
    "unit": "Percentage of active population",
    "sex": "Total",
    "age": "Total",
    "time": "2005",
    "geo": "Austria",
    "value": 5.6
  },
  ...
]

Instead, we need to have a property for each category of the geo dimension (where the country information is) to store their values for each combination of the categories of the rest of the dimensions:

[
  {
    "unit": "Percentage of active population",
    "sex": "Total",
    "age": "Total",
    "time": "2005",
    "AT": 5.6,
    "BE": 8.5,
    "BG": 10.1,
    "CY": 5.3,
    ...
  },
  ...
]

That is, we need to transpose the values by geo:

jsonstat2arrobj unr.jsonstat unr-transp.json --by geo

Dataset tesem120 contains several single-category dimensions: sex and age are always "Total" and unit is always "Percentage of active population". We can remove them from the transposed JSON:

jsonstat2arrobj unr.jsonstat unr-drop.json --by geo --drop sex,age,unit

4. Convert JSON to CSV

For this task, there are many tools out there. We will be using d3-dsv, which includes a json2csv command.

npm install -g d3-dsv

To convert our last JSON to CSV:

json2csv < unr-drop.json > unr.csv

The resulting CSV has comma as the column delimiter and dot as the decimal mark. If what you need is a CSV with comma as the decimal mark, first you must use the jsonstat2arrobj --comma (decimal mark) option:

jsonstat2arrobj unr.jsonstat unr-comma.json --by geo --drop sex,age,unit --comma

And then specify in json2csv a different column delimiter (for example, a semicolon):

json2csv < unr-comma.json > unr-comma.csv -w ";"

5. Altogether now

All the process has required three lines and three files (unr.jsonstat, unr-drop.json, unr.csv):

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 -o unr.jsonstat

jsonstat2arrobj unr.jsonstat unr-drop.json --by geo --drop sex,age,unit

json2csv < unr-drop.json > unr.csv

This is not necessary, though: all the process can be done in a single line, piping the output of a program to another program. For that, we need to enable jsonstat2arrobj stream interface (--stream) which will allow us to use pipes (|) and redirects (<, >).

In the stream interface, this command

jsonstat2arrobj unr.jsonstat unr.json

must be rewritten as

jsonstat2arrobj < unr.jsonstat > unr.json --stream

So to get a comma-delimited CSV with dot as the decimal mark in a single line:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj --by geo --drop sex,age,unit --stream | json2csv > unr.csv

Or a little shorter:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj -b geo -d sex,age,unit -t | json2csv > unr.csv

And to get a semicolon-delimited CSV with comma as the decimal mark:

curl http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/tesem120?precision=1 | jsonstat2arrobj -b geo -d sex,age,unit -k -t | json2csv > unr-comma.csv -w ";"
Clone this wiki locally