Flexible pipeline for modeling time series data with DeepAR, calling predictions, and producing results.
- Initial grouping of data to desired level for forecasting
- Processing for DeepAR json format
- Tuning job
- Batch Transform and post-processing of results
Crime data is quite stochastic, however seasonal patterns prevail. As such, predicting at the daily level would not be very effective, so predicting at the weekly level is a more tractable problem.
As less cars are on the roads, more are stolen or looted.
The data starts in January 2016 and goes through early January 2021. In order to assess performance on a sizeable portion of the data, I chose to train through December 2019, and predict 52 weeks in the year 2020.
The maximum Context Length equals:
(Training Set length) - (Prediction Period) = 209-52 = 157
After a few iterations of hyperparameter tuning jobs, the results are not as good as expected. Things to consider that affect model performance:
- Immediate drop in crime as a result of Denver Covid shutdown in March
- Civil unrest and rioting in late May/early June
- Lower crime overall due to Covid lockdowns across 2020
- Add categorical features for each series (i.e. creating broader group for grouping similar categories such as "auto-theft" and "theft-from-motor-vehicle"
- Add dynamic features (i.e. binary indicators for scheduled political events and holidays)