Kaggle M5 Forecasting and Uncertainity

This repo explores two Kaggle competitions that are complementary.

M5 Forecasting - Accuracy
- point estimate of the items that are sold
M5 Forecasting - Uncertainty
- output probability distribution of those predictions

Both looks at the hierarchical sales data from Walmart to forecast daily sales for the next 28 days.

Go to the wiki for more details on the problem set and modeling work.

Setup

Tooling

Install Python, minimum 3.7.5 (pyenv is the easiest way to install specific versions)
Python Environment/Dependency Management via poetry
- Install poetry - instructions
DVC for Data Pipeline Management
- Install dvc - instructions
Jupyter Notebook environment via Docker

A lot of opinions were pulled from here

General Usage

Because of the repository structure and poetry for dependency management, you can leverage auto-completion within the IDE and leverage that same code within the Jupyter Notebooks. In addition, the python packages you pull in will be the same versions between your IDE and the Jupyter Notebook.

Configure IDE for Usage

If using PyCharm, you'll need to set the project interpreter as the virtualenv that was created when getting dependencies installed. To do this, run poetry env list --full-path to get the full directory path of the Python environment.

Configure Jupyter Notebook

Navigate into the root of the cloned directory, and run docker-compose up to launch a Jupyter Notebook server. If you change any dependencies through poetry, you'll need to rebuild the container via docker-compose build.

Note that the Jupyter notebook by default sets the root to be the repository root, so all files that are used for notebook development should be within the repository.

If you'd like to have a different port for your Jupyter server, you can run JUPYTER_PORT=<new port> docker-compose up to use whatever port you'd like. The default messaging will still say port 8888, but it's allocated to whatever port you pass in.

Use `src` in notebooks

Leverage the following code snippet to use the src directory in notebooks

import sys
# This should navigate to the repository root
sys.path.append('../')

%reload_ext autoreload
%autoreload 2

Data Gathering

All data are obtained from the kaggle website here

calendar.csv - Contains information about the dates on which the products are sold.
sales_train_validation.csv - Contains the historical daily unit sales data per product and store [d_1 - d_1913]
sample_submission.csv - The correct format for submissions. Reference the Evaluation tab for more info.
sell_prices.csv - Contains information about the price of the products sold per store and date.
sales_train_evaluation.csv - Available once month before competition deadline. Will include sales [d_1 - d_1941]

See under notebook/exploratory/0.1-alee-initial-eda.ipynb for details about each of these files.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.dvc		.dvc
data		data
notebook/exploratory		notebook/exploratory
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
clean.data.dvc		clean.data.dvc
docker-compose.yaml		docker-compose.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kaggle M5 Forecasting and Uncertainity

Setup

Tooling

General Usage

Configure IDE for Usage

Configure Jupyter Notebook

Use `src` in notebooks

Data Gathering

DVC Pipeline

About

Uh oh!

Releases

Packages

Languages

leeamy1203/kaggle-m5

Folders and files

Latest commit

History

Repository files navigation

Kaggle M5 Forecasting and Uncertainity

Setup

Tooling

General Usage

Configure IDE for Usage

Configure Jupyter Notebook

Use src in notebooks

Data Gathering

DVC Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Use `src` in notebooks

Packages