The influence of model structure and geographic specificity on forecast accuracy among European COVID-19 forecasts
Katharine Sherratt (1), Rok Grah (2), Bastian Prasse (2), Friederike Becker (3), Jamie McLean (1), Sam Abbott (1), Sebastian Funk (1)
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine
- European Centre for Disease Prevention and Control
- Institute of Statistics, Karlsruhe Institute of Technology
- A slide deck offers high level context for what we were interested in, what we did, and what we found.
-
Accurately predicting the spread of infectious disease is essential to supporting public health during outbreaks. However, comparing the accuracy of different forecasting models is challenging. Existing evaluations struggle to isolate the impact of model design choices (like model structure or specificity to the forecast target) from the inherent difficulty of predicting complex outbreak dynamics. Our research introduces a novel approach to address this by systematically adjusting for common factors affecting epidemiological forecasts, accounting for multi-layered and non-linear effects on predictive difficulty.
-
We applied this approach to a large dataset of forecasts from 47 different models submitted to the European COVID-19 Forecast Hub. We adjusted for variation across epidemic dynamics, forecast horizon, location, time, and model-specific effects. This allowed us to isolate the impact of model structure and geographic specificity on predictive performance.
-
Our findings suggest that after adjustment, apparent differences in performance between model structures became minimal, while models that were specific to a single location showed a slight performance advantage over multi-location models. Our work highlights the importance of considering predictive difficulty when evaluating across forecasting models, and provides a framework for more robust evaluations of infectious disease predictions.
Packages are managed using
renv. In order to
install all the required packages, install the renv
package and run
renv::restore()
All the data used in the analysis is stored in the data/
directory. It
has been obtained from public sources. In order to re-download the data
in the data/
directory, use
## Get metadata from googlesheet; save to data/
source(here("R", "get-metadata.R"))
## Get observed data and all Hub forecasts; exclude forecasts; save to data/
source(here("R", "import-data.R"))
In order to re-generate the forecast scores, use
## Score forecasts & ensembles on the log and natural scales; save to data/
source(here("R", "score.R"))
In order to run the GAM on the scores, use
## Model the weighted interval score; save to data/
source(here("R", "model-wis.R"))
View results:
Re-generate results pdf: