Welcome to the ODDRIN code, comprised of both a front-end visualisation component, known as ODD-Mapping, and the back-end statistical engine and real-time updating software, IIDIPUS - Integrated Internal DIsplaced PopUlation Sampler.
The aim of this software is to predict the number of people displaced, the number of fatalities, and the number of buildings damaged in the early phases of rapid-onset natural (and human-generated climate-change related) hazards. Predictions are made as accurate as possible by training the model on hundreds of events and hundreds of thousands of damaged buildings, across a broad demographic of countries and hazard severities.
ODDRIN was designed, developed, made operational and is managed by Dr. Hamish Patten @HamishPatten, Max Anderson Loake @MaxLoake and Professor David Steinsaltz @DavidSteinsaltz, as part of a project developed at the Department of Statistics, University of Oxford.
A Bayesian Approach to Disaster Impact Modelling - This pre-print, submitted to the Royal Statistical Society (RSS), details the model, method, and results.
Data-Driven Earthquake Multi-impact Modeling: A Comparison of Models - Published in the International Journal of Disaster Risk Science, this paper compares a range of machine-learning approaches to earthquake impact modelling.
IDMC GRID 2021 Background Paper - This was a non-peer reviewed article written for the Internal Displacement Monitoring Centre (IDMC) early 2021 to be included with the Global Report on Internal Displacement (GRID).
The code can be decomposed into several sections:
- Main
- Model
- Method
- Data
- Object Orientated Programming (OOP) class formation
- Additional functions
Here we try to explain without too much detail the most important files from each section.
Main.R
- This is where the ODDRIN model parameterisation occurs. Here we extract the pre-formatted data, model formulas and structuring, and then run the model-training algorithm to parameterise the model.AutoQuake.R
- This file allows, requiring minimal input, an automated extraction of everything necessary to predict the spatial distribution and magnitude of the mortality, displaced population, and building damage in the immediate aftermath of earthquakes, including fore-shocks and after-shocks.RealTimeIIDIPUS.R
(not yet included) - Real-time tracking of the occurrence of rapid-onset hazards, including predicting the magnitude and spatial distribution of the displaced population, then broadcasting this to partners, such as the IFRC GO Platform.
IIDIPUSModelTraining
- Extract data, model and methodology, then train the model using an Adaptive Markov Chain Monte Carlo (AMCMC) algorithm or Sequential Monte Carlo (SMC) algorithm.AutoQuake
- Extracts the earthquake intensity data, creates an object that automatically extracts the relevant exposure and vulnerability information, then makes a prediction on the fatalities, population displacement and building damage, per gridpoint.
Model.R
- Here we can find everything model-related. This includes damage function equation definitions, declaring the chosen imported vulnerability indicators required, and, finally, the pseudo-marginal log-likelihood, prior and posterior distribution equations for population displacement and also satellite building damage estimations. Also includes the linear predictor terms that parameterise the systemic vulnerability.
HighLevelPriors
- Approximate Bayesian Computing (ABC) method of rejectionGetLP
- Calculate the exposure-related component of the vulnerability over all grid-cells (e.g. using the SHDI data)GetLP_single
- Calculate the exposure-related component of the vulnerability for a single grid cellgetLP_event
- Calculate the hazard-related component of the vulnerability (e.g. using the night time indicator)addTransfParams
- Transforms parameters to reduce correlation between parametersSamplePolyImpact
- Sample the impact for each event in the provided event setSamplePointImpact
- Sample the impact for each building in the point building datasetCalcDist
- Calculates the loss function comparing the sampled and observed data
Method.R
- Define the two algorithms that are used to parameterise the model via likelihood-free Bayesian statistics. The options are the Adaptive MCMC algorithm described in Del Moral, Doucet and Jasra, 2012, and the ABC-SMC algorithm described in Spencer, 2021.GetInitialValues.R
- This file allows the initialisation of the AMCMC algorithm, either by using samples from past model runs or samples from the prior to estimate an appropriate proposal covariance.
Proposed2Physical
andPhysical2Proposed
- link functions for the model parameters, to ensure that the model parameters sampled by the MCMC proposal distribution are on the real line with infinite support.multvarNormProp
- proposal parameter set generation function (multivariate normal distribution)AMCMC
- Runs the adaptive MCMC algorithmABCSMC
- Runs the ABC-SMC algorithm
GetPopDemo.R
- Population and demography data extraction, mostly built around the CIESIN data, but now includes the Facebook Data for Good Population Mapping data.GetSatDamage.R
- Given the location of UNOSAT-UNITAR or COPERNICUS building damage assessment data files, this extracts the buildings and harmonises the format of the data to be used by ODDRIN later on, when provided to initialise aBD
objectGetDisaster.R
- Extract the hazard intensity data, the source of which depends on the hazard type. For example, earthquakes rely on USGSGetGDACS.R
- This file is really hideous, I apologise, I was learning to code in R at the time. This file is to access the Global Disaster Alert and Coordination System (GDACS) database, this is key to the real-time component of ODDRINGetUSGS.R
- Access earthquake shakemaps and other information automatically from the United States Geological Survey (USGS).AddVulnerability.R
- Extract and add the vulnerability data related to the exposure, such as the Vs30 data (from USGS) and the SHDI data (from Global Data Lab)GetWorldBank.R
- All things World Bank, as national aggregated values but with temporal trends. For example, it is easy to access temporal trends of population count for most countries around the world, and even have access to when the last time the data was updated via national surveys.GetOSM.R
- The file that accesses OpenStreetMaps, including downloading buildings and roads located within a certain bounding box, country or region polygon. Be careful what you wish for here... if your search is too broad, you'll never be able to access anything! Go in small chunks and slowly cover the area you want.GetBuildingCounts.R
- The file that accesses building footprint data from Microsoft/Bing Building Footprint datasets.
ExtractData
- This is the function that extracts all that we need for ODDRIN for a given hazard occurence. However, currently, this is only automated for earthquakes. Provided a collection of estimates of the maximum observed displaced population of an event (e.g. from IDMC), including the date of the event and the country, this function will find the matching value in GDACS, then create ODD classes, then do the same for the satellite image-based building assessment data.GetPopulationBbox
- Provided only a bounding box and the folder name for the population data, this function extracts the population data from CIESIN in a memory efficient way, also ensuring things like continuity across the longitude=0 plane.InterpPopWB
andInterpGDPWB
- These functions extract the population and GDP nationally aggregated values, respecitively, from the World Bank exactly on the date provided (through interpolation/extrapolation techniques), which are used to make sure that the CIESIN and Kummu pop & GDP values are updated to reflect the value on the day of the hazard.ExtractBDfiles
- Provided the location of the folder where the satellite image-based building damage assessment data is kept, this function will extract all UNOSAT and all Copernicus data ready to create an instance of theBD
class.GetDisaster
- This is the function that, provided with only minimal input (bounding box, start and end date, hazard type), can extract hazard intensity raster data, and output HAZARD objects made from the data.GetEarthquake
- Automated extraction for earthquakes from USGS, forming a list ofHAZARD
objects.FilterGDACS
- Extract data from GDACS through their API, then filter it to get only what we need.GetUSGS
- For earthquakes,GetDisaster
depends entirely on this function, which accesses the USGS database, given minimal input, and extracts the important data and forms a HAZARD class instance from it.
ODDobj.R
- The principle ODDRIN class, whereby hazard intensities (from all hazards included), exposed population and exposed buildings, as well as vulnerability information are all included as fields/attributes. The methods of the class greatly facilitate automating the initialisation of objects with only minimal data provided as input, whereby, for example, the interpolation of hazard intensities onto the population grid is automated.BDobj.R
- This class is for the satellite image-based building damage assessment data. The main difference from theODD
class is that this is that the data is not on a grid but can be considered as a list of points in space whereby the hazard intensities and other information is interpolated.HAZARDobj.R
- Hazard intensity data is read in and then structured into the correct form to be provided to theODD
orBD
classes. This class is used heavily byGetUSGS.R
, for example.
initialize
- Initialises the objects, with a unique initialisation function per object mentioned aboveDispX
- Predict the number of people displaced, based on the hazard, the model and the parameterisationreadODD
- read in a saved ODD file (stored as RDS). For BD objects and HAZARD objects, the relevant functions are readBD() and readHAZ() respectively.saveODD
- save an ODD file (stored as RDS). For BD objects and HAZARD objects, the relevant functions are saveBD() and saveHAZ() respectively.BDX
- Predict the building damage classification level, based on the hazard, the model and the parameterisationBDinterpODD
- Creating an instance of theBD
class is facilitated by providing the instance of theODD
class that corresponds with the same hazard(s).
Functions.R
- Includes all the miscellaneous functions that are required by the ODDRIN code.
convRaster2SPDF
- Converts rasters that are imported from a.tif
file into theSpatialPixelsDataFrame
format which ODDRIN relies on.convMat2SPDF
- Same but from matrix format toSpatialPixelsDataFrame
formatcoords2country
Given a longitude and latitude coordinate, which country does it belong to?countriesbbox
Given the ISO3C code for a given country, what is the countries bounding box?
Before doing anything, please change the directory location environment variable dir
and directory
(these two must be equal) in the GetEnv.R
file. The simplest installation of ODDRIN is to download and load only the most fundamentally important packages and to source only the files that you will need. To do this, change packred<-T
in the file GetEnv.R
. Installation in RStudio is simple, just run the following:
# Extract Environment Variables
source('RCode/GetEnv.R')
# Download and install the necessary packages:
source('RCode/GetODDPackages.R')
This will error if you have not already installed the necessary data (see the end of installation instructions - 'Data to be downloaded manually').
For a full installation of ODDRIN, the problem is getting the R package rJava
to work. For Linux and Mac distributions, follow the installation instructions in 'Linux and Mac Installation' below. For Windows, follow the instructions under 'Windows Installation'.
In order to install the full ODDRIN package for Linux and Mac distributions, open a terminal and run the following:
sudo apt-get install libcurl4-openssl-dev libxml2-dev libjq-dev libprotobuf-dev
libv8-dev protobuf-compiler openjdk-8-jdk libssh-dev libssl-dev
libgdal-dev libudunits2-dev libopenmpi-dev
This installs all sorts of important software, not just rJava
, but this one is what causes the problems - note the openjdk-8-jdk
package is the difficult one. Next part is to make sure that you have an enviroment variable for the location of your java libraries. In your /etc/environment
file add the java libraries environment variable (using sudo nano /etc/environment
):
LD_LIBRARY_PATH=/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64/:/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64/
Please check that the folder /usr/lib/jvm/java-8-openjdk-amd64/lib/amd64/
actually exists! Otherwise, insert the folder location you find (another example could be LD_LIBRARY_PATH=/usr/lib/jvm/jre/lib/amd64:/usr/lib/jvm/jre/lib/amd64/default
). NOW RESTART YOUR COMPUTER! Follow this up with:
source /etc/environment
sudo R CMD javareconf JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/bin/jar
Finally, with packred<-F
in the file GetEnv.R
, run the following:
# Extract Environment Variables
source('RCode/GetEnv.R')
# Download and install the necessary packages:
source('RCode/GetODDPackages.R')
To install the full ODDRIN package on Windows, first install Ubuntu from the Microsoft Store. This allows Linux command syntax to be run on a Windows machine. Follow all instructions as per the 'Linux and Mac Installation' section above EXCEPT the instruction to set the LD_LIBRARY_PATH. Ubuntu does not permit the user to set LD_LIBRARY_PATH in the '/etc/environment' file, so run the following in Ubuntu instead:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64/:/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64/
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64/:/usr/lib/jvm/java-8-openjdk-amd64/lib/amd64/' >> ~/.bashrc
Follow all the remaining steps after the setting of the LD_LIBRARY_PATH instruction as detailed in the 'Linux and Mac Installation' section above, i.e. check that the folder actually exists, restart your computer, etc. For the sudo R CMD, please ensure that R is installed for use on Ubuntu, otherwise you will get a command not found error.
In addition to installing the necessary packages, you are also required to manually download several datasets due to licensing and access restrictions. Please follow the instructions below carefully:
-
Global Data Lab Vulnerability Data (SHDI/SGDI)
You will need to download two sets of files:- The GDL shapefiles from this link (requires free account). Extract and place all files in:
Demography_Data/SocioEconomic/GlobalDataLab/GDL Shapefiles V6/
. Ensure that the fileshdi2022_World_large.shp
is in that folder. - The CSV data containing SHDI/SGDI values from this link (also requires free account). Name the file exactly:
SHDI-SGDI-Total 7.0.csv
and place it in:
Demography_Data/SocioEconomic/GlobalDataLab/
.
- The GDL shapefiles from this link (requires free account). Extract and place all files in:
-
VS30 dataset (soil shear wave velocity)
Download the dataset from the USGS VS30 page. Place the extractedglobal_vs30.tif
and any auxiliary files in:
Hazard_Data/global_vs30_tif/
. -
Global Earthquake Hazard Frequency Data (PGA)
Download the most recent version of the PGA hazard data from the Global Earthquake Model's GSHM page. Extract the file named:
v2023_1_pga_475_rock_3min.tif
and place it in:
Hazard_Data/GEM-GSHM_PGA-475y-rock_v2023/
. -
High-resolution population count dataset
Note that the CIESIN data does not currently seem to be available online. This dataset is not necessary as, if not downloaded, coords2country() is instead used to label the country of each grid cell. If the data does become available again, the download instructions are as follows:Download the GPWv4 population count dataset from CIESIN. Use the Single Year option and choose the ASCII format at 30 arc-second resolution. You will need to download files year by year (e.g., 2000, 2005, 2010, 2015), ensuring that each year has its own folder (e.g.,
Demography_Data/Population/gpw-v4-population-count-2015/
). For the model, only the 2015 dataset is required, particularly the file named:
gpw_v4_population_count_adjusted_to_2015_unwpp_country_totals_rev11_2015_30_sec_1.asc
Make sure this file exists at:
file.exists(paste0(dir,"Demography_Data/Population/gpw-v4-population-count-2015/gpw_v4_population_count_adjusted_to_2015_unwpp_country_totals_rev11_2015_30_sec_1.asc")) == TRUE
.
Each folder should contain 8.asc
files (...30_sec_1.asc
to...30_sec_8.asc
).
To run this software, you will need to add the following environment variables to the GetEnv.R
file:
directory
= dir
(for the lazy writers), FBdirectory
, packred
Note that FBdirectory
is optional, but is important when using data extracted from the Facebook Data for Good platform.
Please run the InstallationChecks.R
file:
source('RCode/InstallationChecks.R')
The files InstallationChecks.R, Main.R, and Autoquake.R provide usage examples.