Programmatic curation of Glottography datasets
Install via pip from PyPI:
pip install pyglottographyNote
We use GDAL's ogr2ogr command to convert between
GeoJSON and GeoPackage formats. Thus, some functionality of pyglottography requires a working
GDAL installation.
pyglottography provides a cldfbench project template,
which can be used with the cldfbench new command:
cldfbench new --template glottographyThe cldfbench workflow uses data in a project's raw directory - enriched with information from
etc - to create a CLDF dataset in the cldf directory. By default, pyglottography expects input
data as follows:
- Geo-data, i.e. shapes for languoid areas, is expected in a GeoJSON file
raw/dataset.geojson. Each feature in thie GeoJSON file should have a unique value for theidproperty. - Metadata about the shapes is expected in a CSV file
etc/features.csv. This file must have anidcolumn with values corresponding to the featureids in the geo-data.
While metadata could be read entirely from the properties object of features in the GeoJSON file,
pyglottography looks up the metadata in a different file to allow for more transparent curation.
Since the Glottolog language catalog is released in a new version about twice a year, it is necessary
to be able to recreate a Glottography dataset with updated Glottocodes. With the raw data setup as
implemented in pyglottography, this only requires changes in etc/features.csv, which can easily
be tracked with versioning software such as git.
cldfbench makecldf cldfbench_<dsid>.py