Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data

This repository supplements the manuscript by Sebastian Lehner , Katharina Enigl , and Matthias Schlögl : Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data.

Highlights

Geospatial clusters are derived from gridded climate and terrain data

Flexible nonparametric methods are used in a sequential workflow

Worfklow steps: dimensionality reduction, clustering, feature importance assessment

Effects of hyperparameter settings on the clustering result are discussed

Validation is performed by means of nested resampling and synoptic plausibility

Workflow

flowchart TD
PREPROCESSING-->ML
subgraph PREPROCESSING
    direction TB
    A[(meteorological data)]
    A-->AT(temperature)
    A-->AP(precipitation)
    A-->AS(snow)
    A-->AR(radiation)
    A-->AE(evapotranspiration)
    AT-->CI("climate indices\n(annual resolution)")
    AP-->CI
    AR-->CI
    AE-->CI
    AS-->CI
    CI-->|temporal aggregation|CIagg("climate indices for\nclimate reference periods")
    B[(geomorphometric data)]
    B--->GI1("geomorphometric indices\n(10m)")
    GI1-->|spatial aggregation|GI2("geomorphometric indices\n(1km)")
    CIagg-->GCD("unified multivariate\nphysioclimatic dataset")
    GI2-->GCD
    GCD-->|correlation analysis|GCDcor{{"reduced multivariate\nphysioclimatic dataset"}}
end
subgraph ML
    direction LR
    CLUSTERING --> FEATURE-EXPLANATION
end
subgraph CLUSTERING
    direction TB
    GCDcl[("reduced multivariate\nphysioclimatic dataset")]-->PCA(PCA:<br>linear dimension reduction)
    PCA-->UMAP(UMAP:<br>non-linear dimension reduction)
    UMAP-->HDB(HDBSCAN:<br>clustering in UMAP subspace)
    HDB-->GCC{{physioclimatic clusters}}
end
subgraph FEATURE-EXPLANATION
    direction TB
    GCDfe[("reduced multivariate\nphysioclimatic dataset")]-->RF(random forest)
    GCCfe[("physioclimatic clusters")]-->RF
    RF-->PFI{{permutation feature importance}}
    RF-->NESTEDCV{{performance estimation<br>through nested resampling}}
end

Data

The following data sets have been used:

SPARTACUS (doi:10.1007/s00704-015-1411-4, doi:10.1007/s00704-017-2093-x)
WINFORE (doi:10.5194/hess-20-1211-2016)
SNOWGRID (doi:10.3390/atmos11121330)
a digital terrain model derived from airborne laser scanning (ALS-DTM)

All climate data sets are available through the GeoSphere Austria Data Hub. The elevation data set is avaliable through the Austrian Open Government Data Platform data.gv.at.

See doc/features.md for details on feature definition.

Repo structure

This repo is loosely based on the Cookiecutter Data Science.

.
├── dat          # data sets
│   ├── interim  # interim data sets
│   ├── output   # final output
│   └── raw      # raw, immutable input data
├── dev          # scripts 
├── doc          # documentation
├── renv         # reproducible environments (R)
└── plt          # plots

Reproducibility

Python: Run conda env create -f environment.yml to create the conda-environment subregion-derivation.
R: The R environment can be restored from the renv.lock using renv::restore().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data

Highlights

Workflow

Data

Repo structure

Reproducibility

About

Uh oh!

Releases

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
dat		dat
dev		dev
doc		doc
plt		plt
renv		renv
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
renv.lock		renv.lock
subregion_derivation.Rproj		subregion_derivation.Rproj

Geosphere-Austria/subregion-derivation

Folders and files

Latest commit

History

Repository files navigation

Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data

Highlights

Workflow

Data

Repo structure

Reproducibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 2

Uh oh!

Languages