Skip to content

I use Facebook's Prophet, LinkedIn's Greykite, and classical ARIMA frameworks to forecast transit ridership; performance is limited by data granularity.

Notifications You must be signed in to change notification settings

gigi-codes/ridership_forecast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model Transit Ridership as Fuel Prices Fluctuate


SUMMARY

I modelled ridership in general, and with a few additional regressors. Various models using Facebook's Prophet and Linkedin's Greykite frameworks produced good results for modelling data prior to early-COVID-pandemic-shutdowns, but they and ARIMA models failed to capture the percipitous, unprescendented monthly riderhip drop, and exceedingly low rise.

I included average fuel price, increased number of registered cars as well as consumer debt as additional regressors in both Prophet and Greykite, but there was no significant effect at that resolution.

Sourcing local data - at the ZIP-code or even county level, would likely yield better results.

=== CURRENTLY WORKING ON HOURLY DATA.

The also unprecedented fuel prices further exemplify one of the main challenges in time-series modeling, which has prompted a very simple plot of at-the-pump-price-rate-of-change compared to petroleum-barrel-price-change-rate compared to wheat and bread.

A little bit of Econ.


DATA


Data was sourced via various sources, through API requests and extracting from spreadsheets obtained directly from state sources. Details can be found in the data acquisition notebooks


Exploratory Data Analyses

I've set up interactive visualizations using Plotly on my website. Ridership numbers had a downward decline starting in late 2018, so initial forecasts predicted lower ridership in 2020 even prior to the pandemic.

Summary statistics for daily, weekly, monthy and annual ridership shown below.


MODELS:

Model performance visualized below

ARIMA model over gridsearch yielded (4,1,4) model.

Greykite model with extra regressors performed best on data prior to COVID

PROPHET model poorly predicted downward trend even prior to COVID decrease

PROPHET framework forecast within confidence

Greykite framework forecast pre-pandemic

Greykite framework components for all data

ID MODEL DATA RMSE MSE MAE MAPE AIC
A PROPHET < 2020 446 152 199 052 198 567 375 686
B PROPHET All BART 1 243 269 5 457 200 928 927 1 181 450
C Greykite All BART 911 443 830 729 769 011 909 234 205 4 621
D Greykite < 2020 1 053 866 110 633 739 830 1 051 141 5.25 4 581
D ARIMA < 2020 208 415 43 437 169 639 89 738 5880

Conclusion:

Although Prophet and Greykite are extremely powerful and flexible, tried-and-true ARIMA produced the best model for forecasting rideship, both for data inclusive of COVID shutdowns and without. As a true time-series model, the changes were better captured. Prophet and Greykite are built as standard linear regerssors parameters and coefficients to model holidays, seasons, weekends, etc. - the limitations of monthly data also negate the power and flexibility of frameworks designed to model just that.

Further Work: I am sourcing daily data from BART, so auto-regression on daily, weekly, monthly and yearly basis should be straightforward. However, limitation for in-depth exogenous anaylis is still granularity of such datasets. I am also working on mapping short vs long-distances (commute) trips to assess if there are differences in COVID-realted ridership for intra- vs inter- city trips.

About

I use Facebook's Prophet, LinkedIn's Greykite, and classical ARIMA frameworks to forecast transit ridership; performance is limited by data granularity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published