I modelled ridership in general, and with a few additional regressors. Various models using Facebook's Prophet and Linkedin's Greykite frameworks produced good results for modelling data prior to early-COVID-pandemic-shutdowns, but they and ARIMA models failed to capture the percipitous, unprescendented monthly riderhip drop, and exceedingly low rise.
I included average fuel price, increased number of registered cars as well as consumer debt as additional regressors in both Prophet and Greykite, but there was no significant effect at that resolution.
Sourcing local data - at the ZIP-code or even county level, would likely yield better results.
=== CURRENTLY WORKING ON HOURLY DATA.
The also unprecedented fuel prices further exemplify one of the main challenges in time-series modeling, which has prompted a very simple plot of at-the-pump-price-rate-of-change compared to petroleum-barrel-price-change-rate compared to wheat and bread.
A little bit of Econ.
Data was sourced via various sources, through API requests and extracting from spreadsheets obtained directly from state sources. Details can be found in the data acquisition notebooks
I've set up interactive visualizations using Plotly on my website. Ridership numbers had a downward decline starting in late 2018, so initial forecasts predicted lower ridership in 2020 even prior to the pandemic.
Summary statistics for daily, weekly, monthy and annual ridership shown below.
Model performance visualized below
ARIMA model over gridsearch yielded (4,1,4) model.
Greykite model with extra regressors performed best on data prior to COVID
PROPHET model poorly predicted downward trend even prior to COVID decrease
PROPHET framework forecast within confidence
Greykite framework forecast pre-pandemic
Greykite framework components for all data
ID | MODEL | DATA | RMSE | MSE | MAE | MAPE | AIC |
---|---|---|---|---|---|---|---|
A | PROPHET | < 2020 | 446 152 | 199 052 198 567 | 375 686 | ||
B | PROPHET | All BART | 1 243 269 | 5 457 200 928 927 | 1 181 450 | ||
C | Greykite | All BART | 911 443 | 830 729 769 011 | 909 234 | 205 | 4 621 |
D | Greykite | < 2020 | 1 053 866 | 110 633 739 830 | 1 051 141 | 5.25 | 4 581 |
D | ARIMA | < 2020 | 208 415 | 43 437 169 639 | 89 738 | 5880 |
Although Prophet and Greykite are extremely powerful and flexible, tried-and-true ARIMA produced the best model for forecasting rideship, both for data inclusive of COVID shutdowns and without. As a true time-series model, the changes were better captured. Prophet and Greykite are built as standard linear regerssors parameters and coefficients to model holidays, seasons, weekends, etc. - the limitations of monthly data also negate the power and flexibility of frameworks designed to model just that.
Further Work: I am sourcing daily data from BART, so auto-regression on daily, weekly, monthly and yearly basis should be straightforward. However, limitation for in-depth exogenous anaylis is still granularity of such datasets. I am also working on mapping short vs long-distances (commute) trips to assess if there are differences in COVID-realted ridership for intra- vs inter- city trips.