Model Transit Ridership as Fuel Prices Fluctuate

SUMMARY

I modelled ridership in general, and with a few additional regressors. Various models using Facebook's Prophet and Linkedin's Greykite frameworks produced good results for modelling data prior to early-COVID-pandemic-shutdowns, but they and ARIMA models failed to capture the percipitous, unprescendented monthly riderhip drop, and exceedingly low rise.

I included average fuel price, increased number of registered cars as well as consumer debt as additional regressors in both Prophet and Greykite, but there was no significant effect at that resolution.

Sourcing local data - at the ZIP-code or even county level, would likely yield better results.

=== CURRENTLY WORKING ON HOURLY DATA.

The also unprecedented fuel prices further exemplify one of the main challenges in time-series modeling, which has prompted a very simple plot of at-the-pump-price-rate-of-change compared to petroleum-barrel-price-change-rate compared to wheat and bread.

A little bit of Econ.

DATA

Data was sourced via various sources, through API requests and extracting from spreadsheets obtained directly from state sources. Details can be found in the data acquisition notebooks

Exploratory Data Analyses

I've set up interactive visualizations using Plotly on my website. Ridership numbers had a downward decline starting in late 2018, so initial forecasts predicted lower ridership in 2020 even prior to the pandemic.

Summary statistics for daily, weekly, monthy and annual ridership shown below.

MODELS:

Model performance visualized below

ARIMA model over gridsearch yielded (4,1,4) model.

Greykite model with extra regressors performed best on data prior to COVID

PROPHET model poorly predicted downward trend even prior to COVID decrease

PROPHET framework forecast within confidence

Greykite framework forecast pre-pandemic

Greykite framework components for all data

ID	MODEL	DATA	RMSE	MSE	MAE	MAPE	AIC
A	PROPHET	< 2020	446 152	199 052 198 567	375 686
B	PROPHET	All BART	1 243 269	5 457 200 928 927	1 181 450
C	Greykite	All BART	911 443	830 729 769 011	909 234	205	4 621
D	Greykite	< 2020	1 053 866	110 633 739 830	1 051 141	5.25	4 581
D	ARIMA	< 2020	208 415	43 437 169 639	89 738		5880

Conclusion:

Although Prophet and Greykite are extremely powerful and flexible, tried-and-true ARIMA produced the best model for forecasting rideship, both for data inclusive of COVID shutdowns and without. As a true time-series model, the changes were better captured. Prophet and Greykite are built as standard linear regerssors parameters and coefficients to model holidays, seasons, weekends, etc. - the limitations of monthly data also negate the power and flexibility of frameworks designed to model just that.

Further Work: I am sourcing daily data from BART, so auto-regression on daily, weekly, monthly and yearly basis should be straightforward. However, limitation for in-depth exogenous anaylis is still granularity of such datasets. I am also working on mapping short vs long-distances (commute) trips to assess if there are differences in COVID-realted ridership for intra- vs inter- city trips.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
code		code
data		data
images		images
lit		lit
.gitignore		.gitignore
GuevaraSlides.pdf		GuevaraSlides.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model Transit Ridership as Fuel Prices Fluctuate

SUMMARY

DATA

Exploratory Data Analyses

MODELS:

Conclusion:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gigi-codes/ridership_forecast

Folders and files

Latest commit

History

Repository files navigation

Model Transit Ridership as Fuel Prices Fluctuate

SUMMARY

DATA

Exploratory Data Analyses

MODELS:

Conclusion:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages