Skip to content

Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data

Notifications You must be signed in to change notification settings

Geosphere-Austria/subregion-derivation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data

DOI paper DOI data Python code style: black R code style: tidyverse

This repository supplements the manuscript by Sebastian Lehner , Katharina Enigl , and Matthias Schlögl : Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data.

Highlights

  • Geospatial clusters are derived from gridded climate and terrain data
  • Flexible nonparametric methods are used in a sequential workflow
  • Worfklow steps: dimensionality reduction, clustering, feature importance assessment
  • Effects of hyperparameter settings on the clustering result are discussed
  • Validation is performed by means of nested resampling and synoptic plausibility

Workflow

flowchart TD
PREPROCESSING-->ML
subgraph PREPROCESSING
    direction TB
    A[(meteorological data)]
    A-->AT(temperature)
    A-->AP(precipitation)
    A-->AS(snow)
    A-->AR(radiation)
    A-->AE(evapotranspiration)
    AT-->CI("climate indices\n(annual resolution)")
    AP-->CI
    AR-->CI
    AE-->CI
    AS-->CI
    CI-->|temporal aggregation|CIagg("climate indices for\nclimate reference periods")
    B[(geomorphometric data)]
    B--->GI1("geomorphometric indices\n(10m)")
    GI1-->|spatial aggregation|GI2("geomorphometric indices\n(1km)")
    CIagg-->GCD("unified multivariate\nphysioclimatic dataset")
    GI2-->GCD
    GCD-->|correlation analysis|GCDcor{{"reduced multivariate\nphysioclimatic dataset"}}
end
subgraph ML
    direction LR
    CLUSTERING --> FEATURE-EXPLANATION
end
subgraph CLUSTERING
    direction TB
    GCDcl[("reduced multivariate\nphysioclimatic dataset")]-->PCA(PCA:<br>linear dimension reduction)
    PCA-->UMAP(UMAP:<br>non-linear dimension reduction)
    UMAP-->HDB(HDBSCAN:<br>clustering in UMAP subspace)
    HDB-->GCC{{physioclimatic clusters}}
end
subgraph FEATURE-EXPLANATION
    direction TB
    GCDfe[("reduced multivariate\nphysioclimatic dataset")]-->RF(random forest)
    GCCfe[("physioclimatic clusters")]-->RF
    RF-->PFI{{permutation feature importance}}
    RF-->NESTEDCV{{performance estimation<br>through nested resampling}}
end
Loading

Data

The following data sets have been used:

All climate data sets are available through the GeoSphere Austria Data Hub. The elevation data set is avaliable through the Austrian Open Government Data Platform data.gv.at.

See doc/features.md for details on feature definition.

Repo structure

This repo is loosely based on the Cookiecutter Data Science.

.
├── dat          # data sets
│   ├── interim  # interim data sets
│   ├── output   # final output
│   └── raw      # raw, immutable input data
├── dev          # scripts 
├── doc          # documentation
├── renv         # reproducible environments (R)
└── plt          # plots

Reproducibility

  • Python: Run conda env create -f environment.yml to create the conda-environment subregion-derivation.
  • R: The R environment can be restored from the renv.lock using renv::restore().

About

Derivation of characteristic physioclimatic regions through density-based spatial clustering of high-dimensional data

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors 2

  •  
  •