Probability forecasts (#207)

dnerini · web-flow · commit 88fa8194c59d · 2021-07-04T12:38:56.000+02:00
* Intial commit

* Run black

* Minor fixes

* Rewrite with convolutions

* Documentation

* Clip values btw 0 and 1

To avoid rounding errors

* Add basic testing

* Cleanup test

* Remove duplicated line

* Improve docstrings

* Workaround for TypeError issue with Pillow 8.3

* Small refactoring

Improve docstrings/comments; assign treshold value to missing data

* Add constraints to 'timesteps' argument

* Use &gt;= thr instead of &gt; thr

* More testing

* Add example

* Fix black

* Fix example

* Add lagrangian probability method

* Improve docstrings
diff --git a/doc/source/pysteps_reference/nowcasts.rst b/doc/source/pysteps_reference/nowcasts.rst
@@ -8,6 +8,7 @@ Implementation of deterministic and ensemble nowcasting methods.
 .. automodule:: pysteps.nowcasts.interface
 .. automodule:: pysteps.nowcasts.anvil
 .. automodule:: pysteps.nowcasts.extrapolation
+.. automodule:: pysteps.nowcasts.lagrangian_probability
 .. automodule:: pysteps.nowcasts.sprog
 .. automodule:: pysteps.nowcasts.sseps
 .. automodule:: pysteps.nowcasts.steps
diff --git a/doc/source/references.bib b/doc/source/references.bib
@@ -84,6 +84,17 @@ @ARTICLE{GZ2002
   DOI = "10.1175/1520-0493(2002)130<2859:SDOTPO>2.0.CO;2"
 }
 
+@ARTICLE{GZ2004,
+  AUTHOR = "U. Germann and I. Zawadzki",
+  TITLE = "Scale-Dependence of the Predictability of Precipitation from Continental Radar Images. {P}art {II}: Probability Forecasts",
+  JOURNAL = "Journal of Applied Meteorology",
+  VOLUME = 43,
+  NUMBER = 1,
+  PAGES = "74--89",
+  YEAR = 2004,
+  DOI = "10.1175/1520-0450(2004)043<0074:SDOTPO>2.0.CO;2"
+}
+
 @ARTICLE{Her2000,
   AUTHOR = "H. Hersbach",
   TITLE = "Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems",
diff --git a/doc/source/user_guide/machine_learning_pysteps.rst b/doc/source/user_guide/machine_learning_pysteps.rst
@@ -7,16 +7,16 @@ How to correctly compare the accuracy of machine learning against traditional no
 Before starting the comparison, you need to ask yourself what is the objective of nowcasting:
 
 #. Do you only want to minimize prediction errors?
-#. Do you also want to represent the prediction uncertainty? 
+#. Do you also want to represent the prediction uncertainty?
 
 To achieve objective 1, it is sufficient to produce a single deterministic nowcast that filters out the unpredictable small-scale precipitation features.
 However, this will create a nowcast that will become increasingly smooth over time.
 
 To achieve objective 2, you need to produce a probabilistic or an ensemble nowcast (several ensemble members and realizations).
 
-In weather forecasting (and nowcasting) we usually want to achieve both goals because it is impossible to predict the evolution of a chaotic system with 100% accuracy, especially space-time precipitation fields and thunderstorms! 
+In weather forecasting (and nowcasting) we usually want to achieve both goals because it is impossible to predict the evolution of a chaotic system with 100% accuracy, especially space-time precipitation fields and thunderstorms!
 
-Machine learning and pysteps offer several methods to produce both deterministic and probabilistic nowcasts. 
+Machine learning and pysteps offer several methods to produce both deterministic and probabilistic nowcasts.
 Therefore, if you want to compare machine learning-based nowcasts to simpler extrapolation-based models, you need to select the right method.
 
 1. Deterministic nowcasting
@@ -27,7 +27,7 @@ Deterministic nowcasts can be divided into:
 a. Variance-preserving nowcasts, such as extrapolation nowcasts by Eulerian and Lagrangian persistence.
 b. Error-minimization nowcasts, such as machine learning, Fourier-filtered and ensemble mean nowcasts.
 
-**Very important**: these two types of deterministic nowcasts are not directly comparable because they have a different variance! 
+**Very important**: these two types of deterministic nowcasts are not directly comparable because they have a different variance!
 This is best explained by the decomposition of the mean squared error (MSE):
 
 :math:`MSE = bias^2 + Var`
@@ -41,7 +41,7 @@ Therefore, it is better to avoid directly comparing an error-minimization machin
 A deterministic equivalent of the ensemble mean can be approximated using the modules :py:mod:`pysteps.nowcasts.sprog` or :py:mod:`pysteps.nowcasts.anvil`.
 Another possibility, but more computationally demanding, is to average many ensemble members generated by the modules :py:mod:`pysteps.nowcasts.steps` or :py:mod:`pysteps.nowcasts.sseps`.
 
-Still, even by using the pysteps ensemble mean, it is not given that its variance will be the same as the one of machine learning predictions. 
+Still, even by using the pysteps ensemble mean, it is not given that its variance will be the same as the one of machine learning predictions.
 Possible solutions are to:
 
 #. use a normalized MSE (NMSE) or another score accounting for differences in the variance between prediction and observation.
@@ -94,10 +94,10 @@ The comparison of methods from different types should only be done carefully and
      - MSE, RMSE, MAE, ETS, etc or better normalized scores, etc
    * - Probabilistic (quantile-based)
      - Quantile ANN, quantile random forests, quantile regression
-     - Probabilities derived from :py:mod:`pysteps.nowcasts.steps`/:py:mod:`~pysteps.nowcasts.sseps`
+     - :py:mod:`pysteps.nowcasts.lagrangian_probability` or probabilities derived from :py:mod:`pysteps.nowcasts.steps`/:py:mod:`~pysteps.nowcasts.sseps`
      - Reliability diagram (predicted vs observed quantile), probability integral transform (PIT) histogram
    * - Probabilistic (ensemble-based)
      - GANs, VAEs, etc
      - Ensemble and probabilities derived from :py:mod:`pysteps.nowcasts.steps`/:py:mod:`~pysteps.nowcasts.sseps`
-     - Probabilistic verification: reliability diagrams, continuous ranked probability scores (CRPS), etc. 
+     - Probabilistic verification: reliability diagrams, continuous ranked probability scores (CRPS), etc.
        Ensemble verification: rank histograms, spread-error relationships, etc
diff --git a/examples/probability_forecast.py b/examples/probability_forecast.py
@@ -0,0 +1,130 @@
+#!/bin/env python
+"""
+Probability forecasts
+=====================
+
+This example script shows how to forecast the probability of exceeding an
+intensity threshold.
+
+The method is based on the local Lagrangian approach described in Germann and
+Zawadzki (2004).
+"""
+
+import matplotlib.pyplot as plt
+import numpy as np
+
+from pysteps.nowcasts.lagrangian_probability import forecast
+
+###############################################################################
+# Numerical example
+# -----------------
+
+# parameters
+precip = np.zeros((100, 100))
+precip[10:50, 10:50] = 1
+velocity = np.ones((2, *precip.shape))
+timesteps = [0, 2, 6, 12]
+thr = 0.5
+slope = 1  # pixels / timestep
+
+# compute probability forecast
+out = forecast(precip, velocity, timesteps, thr, slope=slope)
+# plot
+for n, frame in enumerate(out):
+    plt.subplot(2, 2, n + 1)
+    plt.imshow(frame, interpolation="nearest", vmin=0, vmax=1)
+    plt.title(f"t={timesteps[n]}")
+    plt.xticks([])
+    plt.yticks([])
+plt.show()
+
+###############################################################################
+# Real-data example
+# -----------------
+
+from datetime import datetime
+
+from pysteps import io, rcparams, utils
+from pysteps.motion.lucaskanade import dense_lucaskanade
+from pysteps.verification import reldiag_init, reldiag_accum, plot_reldiag
+
+# data source
+source = rcparams.data_sources["mch"]
+root = rcparams.data_sources["mch"]["root_path"]
+fmt = rcparams.data_sources["mch"]["path_fmt"]
+pattern = rcparams.data_sources["mch"]["fn_pattern"]
+ext = rcparams.data_sources["mch"]["fn_ext"]
+timestep = rcparams.data_sources["mch"]["timestep"]
+importer_name = rcparams.data_sources["mch"]["importer"]
+importer_kwargs = rcparams.data_sources["mch"]["importer_kwargs"]
+
+# read precip field
+date = datetime.strptime("201607112100", "%Y%m%d%H%M")
+fns = io.find_by_date(date, root, fmt, pattern, ext, timestep, num_prev_files=2)
+importer = io.get_method(importer_name, "importer")
+precip, __, metadata = io.read_timeseries(fns, importer, **importer_kwargs)
+precip, metadata = utils.to_rainrate(precip, metadata)
+# precip[np.isnan(precip)] = 0
+
+# motion
+motion = dense_lucaskanade(precip)
+
+# parameters
+nleadtimes = 6
+thr = 1  # mm / h
+slope = 1 * timestep  # km / min
+
+# compute probability forecast
+extrap_kwargs = dict(allow_nonfinite_values=True)
+fct = forecast(
+    precip[-1], motion, nleadtimes, thr, slope=slope, extrap_kwargs=extrap_kwargs
+)
+
+# plot
+for n, frame in enumerate(fct):
+    plt.subplot(2, 3, n + 1)
+    plt.imshow(frame, interpolation="nearest", vmin=0, vmax=1)
+    plt.xticks([])
+    plt.yticks([])
+plt.show()
+
+################################################################################
+# Let's plot one single leadtime in more detail.
+
+plt.imshow(fct[2], interpolation="nearest", vmin=0, vmax=1)
+plt.xticks([])
+plt.yticks([])
+plt.colorbar(label="Exceedance probability, P(R>=1 mm/h)")
+plt.show()
+
+###############################################################################
+# Verification
+# ------------
+
+# verifying observations
+importer = io.get_method(importer_name, "importer")
+fns = io.find_by_date(
+    date, root, fmt, pattern, ext, timestep, num_next_files=nleadtimes
+)
+obs, __, metadata = io.read_timeseries(fns, importer, **importer_kwargs)
+obs, metadata = utils.to_rainrate(obs, metadata)
+obs[np.isnan(obs)] = 0
+
+# reliability diagram
+reldiag = reldiag_init(thr)
+reldiag_accum(reldiag, fct, obs[1:])
+fig, ax = plt.subplots()
+plot_reldiag(reldiag, ax)
+ax.set_title("Reliability diagram")
+plt.show()
+
+
+###############################################################################
+# References
+# ----------
+# Germann, U. and I. Zawadzki, 2004:
+# Scale Dependence of the Predictability of Precipitation from Continental
+# Radar Images. Part II: Probability Forecasts.
+# Journal of Applied Meteorology, 43(1), 74-89.
+
+# sphinx_gallery_thumbnail_number = 3
diff --git a/pysteps/nowcasts/interface.py b/pysteps/nowcasts/interface.py
@@ -33,12 +33,15 @@
 
 from pysteps.extrapolation.interface import eulerian_persistence
 from pysteps.nowcasts import anvil, sprog, steps, sseps, extrapolation
+from pysteps.nowcasts import lagrangian_probability
 
 _nowcast_methods = dict()
 _nowcast_methods["anvil"] = anvil.forecast
 _nowcast_methods["eulerian"] = eulerian_persistence
 _nowcast_methods["extrapolation"] = extrapolation.forecast
 _nowcast_methods["lagrangian"] = extrapolation.forecast
+_nowcast_methods["probability"] = lagrangian_probability.forecast
+_nowcast_methods["lagrangian_probability"] = lagrangian_probability.forecast
 _nowcast_methods["sprog"] = sprog.forecast
 _nowcast_methods["sseps"] = sseps.forecast
 _nowcast_methods["steps"] = steps.forecast
@@ -65,6 +68,9 @@ def get_method(name):
     |  lagrangian or  | this approach extrapolates the last observation       |
     |  extrapolation  | using the motion field (Lagrangian persistence)       |
     +-----------------+-------------------------------------------------------+
+    |  lagrangian_\   | this approach computes local lagrangian probability   |
+    |  probability    | forecasts of threshold exceedences.                   |
+    +-----------------+-------------------------------------------------------+
     |  sprog          | the S-PROG method described in :cite:`Seed2003`       |
     +-----------------+-------------------------------------------------------+
     |  steps          | the STEPS stochastic nowcasting method described in   |
@@ -74,9 +80,6 @@ def get_method(name):
     |  sseps          | short-space ensemble prediction system (SSEPS).       |
     |                 | Essentially, this is a localization of STEPS.         |
     +-----------------+-------------------------------------------------------+
-
-    steps and sseps produce stochastic nowcasts, and the other methods are
-    deterministic.
     """
     if isinstance(name, str):
         name = name.lower()
diff --git a/pysteps/nowcasts/lagrangian_probability.py b/pysteps/nowcasts/lagrangian_probability.py
@@ -0,0 +1,131 @@
+# -*- coding: utf-8 -*-
+"""
+pysteps.nowcasts.lagrangian_probability
+======================================
+
+Implementation of the local Lagrangian probability nowcasting technique
+described in :cite:`GZ2004`.
+
+.. autosummary::
+    :toctree: ../generated/
+
+    forecast
+"""
+
+import numpy as np
+from scipy.signal import convolve
+
+from pysteps.nowcasts import extrapolation
+
+
+def forecast(
+    precip,
+    velocity,
+    timesteps,
+    threshold,
+    extrap_method="semilagrangian",
+    extrap_kwargs=None,
+    slope=5,
+):
+    """
+    Generate a probability nowcast by a local lagrangian approach. The ouput is
+    the probability of exceeding a given intensity threshold, i.e.
+    P(precip>=threshold).
+
+    Parameters
+    ----------
+    precip: array_like
+       Two-dimensional array of shape (m,n) containing the input precipitation
+       field.
+    velocity: array_like
+       Array of shape (2,m,n) containing the x- and y-components of the
+       advection field. The velocities are assumed to represent one time step
+       between the inputs.
+    timesteps: int or list of floats
+       Number of time steps to forecast or a sorted list of time steps for which
+       the forecasts are computed (relative to the input time step).
+       The number of time steps has to be a positive integer.
+       The elements of the list are required to be in ascending order.
+    threshold: float
+       Intensity threshold for which the exceedance probabilities are computed.
+    slope: float, optional
+       The slope of the relationship between optimum scale and lead time in
+       pixel / timesteps. Germann and Zawadzki (2004) found the optimal slope
+       to be equal to 1 km / minute.
+
+    Returns
+    -------
+    out: ndarray
+        Three-dimensional array of shape (num_timesteps, m, n) containing a time
+        series of nowcast exceedence probabilities. The time series starts from
+        t0 + timestep, where timestep is taken from the advection field velocity.
+
+    References
+    ----------
+    Germann, U. and I. Zawadzki, 2004:
+    Scale Dependence of the Predictability of Precipitation from Continental
+    Radar Images. Part II: Probability Forecasts.
+    Journal of Applied Meteorology, 43(1), 74-89.
+    """
+    # Compute deterministic extrapolation forecast
+    if isinstance(timesteps, int) and timesteps > 0:
+        timesteps = np.arange(1, timesteps + 1)
+    elif not isinstance(timesteps, list):
+        raise ValueError(f"invalid value for argument 'timesteps': {timesteps}")
+    precip_forecast = extrapolation.forecast(
+        precip,
+        velocity,
+        timesteps,
+        extrap_method,
+        extrap_kwargs,
+    )
+
+    # Ignore missing values
+    nanmask = np.isnan(precip_forecast)
+    precip_forecast[nanmask] = threshold - 1
+    valid_pixels = (~nanmask).astype(float)
+
+    # Compute exceedance probabilities using a neighborhood approach
+    precip_forecast = (precip_forecast >= threshold).astype(float)
+    for i, timestep in enumerate(timesteps):
+        scale = timestep * slope
+        if scale == 0:
+            continue
+        kernel = _get_kernel(scale)
+        kernel_sum = convolve(
+            valid_pixels[i, ...],
+            kernel,
+            mode="same",
+        )
+        precip_forecast[i, ...] = convolve(
+            precip_forecast[i, ...],
+            kernel,
+            mode="same",
+        )
+        precip_forecast[i, ...] /= kernel_sum
+    precip_forecast = np.clip(precip_forecast, 0, 1)
+    precip_forecast[nanmask] = np.nan
+    return precip_forecast
+
+
+def _get_kernel(size):
+    """
+    Generate a circular kernel.
+
+    Parameters
+    ----------
+    size : int
+        Size of the circular kernel (its diameter). For size < 5, the kernel is
+        a square instead of a circle.
+
+    Returns
+    -------
+    2-D array with kernel values
+    """
+    middle = max((int(size / 2), 1))
+    if size < 5:
+        return np.ones((size, size), dtype=np.float32)
+    else:
+        xx, yy = np.mgrid[:size, :size]
+        circle = (xx - middle) ** 2 + (yy - middle) ** 2
+        return np.asarray(circle <= (middle ** 2), dtype=np.float32)
diff --git a/pysteps/tests/test_interfaces.py b/pysteps/tests/test_interfaces.py
diff --git a/pysteps/tests/test_nowcasts_lagrangian_probability.py b/pysteps/tests/test_nowcasts_lagrangian_probability.py