This project builds a Dirichlet regression model for forecasting 2020 US Senate Election. We collect historical polling data and first build a Gaussian process model that captures the underlying time-evoling voter preference. The Dirichlet model then integrates the posteriors of voter preferences on election day and other election fundementals such as state-level partisan voting index, past experience of candidates and yearly swings. When forecasting 2020 races, our model naturally produces a posterior distribution of the future vote shares, rather than point estimates.
To run this project, users need to have matlab and R installed locally. GPML matlab toolbox is already in the directory.
This is an example of how to list things you need to use the software and how to install them.
- gpml in matlab
addpath("gpml-matlab-v3.6-2015-07-07");
startup;
- packages in R
library(rstan)
library(MCMCpack)
library(dplyr)
library(ggridges)
library(ggplot2)
library(grid)
- Clone the repo
git clone https://github.com/yahoochen97/CNNForecasting.git
- Install R packages
install.packages('rstan')
install.packages('MCMCpack')
install.packages('dplyr')
install.packages('ggridges')
install.packages('ggplot2')
install.packages('grid')
We exploit Leave-one-year-out (LOYO) validation procedure to determine optimal hyperparameters to our gp model for different forecasting horizons. To train the model at predefined horizons, run matlab code
matlab -nodisplay -nodesktop -r "main('GP', 1); exit"
This will generate files containing gp posteriors of 1992-2016 races for different forecasting horizons and gp hyperparameters. Then for each LOYO year/horizon, run R code
Rscript ./onejob horizon year 'GP'
Average negative log likelihood (nlZ) for validation races for each hyperparameters will be stored to directory './nlZs'. To get the index of optimal hyperparameter in search sequence, run R code
Rscript loocv_nlZs.R
To forecast 2020 races at predefined horizons, run matlab and R code
matlab -nodisplay -nodesktop -r "main('GP', 3); exit"
Rscript main.R 2020 'GP'
Distributed under the MIT License. See LICENSE
for more information.
Yehu Chen - chenyehu@wustl.edu
Project Link: https://github.com/yahoochen97/CNNForecasting