Welcome to the Time Series Forecasting Project. The mission is to predict store sales using data from Corporation Favorita, a major grocery retailer in Ecuador, using Machine Learning techniques.
Below is a preview showcasing some features of the notebook.
The primary focus of this project is to utilize time series regression analysis to forecast sales for Corporation Favorita, a prominent grocery retailer based in Ecuador.
The objective is to develop a robust model capable of accurately forecasting future sales by leveraging the extensive time series data of thousands of products sold across various Corporation Favorita locations. The resulting forecasts will provide valuable insights to the store's management, enabling them to formulate effective inventory and sales plans.
I will utilize the CRISP-DM framework to execute the project.
The project includes the following stages:
- Azubi Africa's SQL Server database, GitHub repository and OneDrive.
N/B: The datasets from the sources above where saved in the root folder of this repository for easy access
- Utilizing
pyodbc
for SQL data - Leveraging
pandas
for CSV and Excel files
- Merging
- Duplicate checks
- Handling missing values
- Renaming Columns and Changing Datatypes
- Creating new features
- Visualizations
- Hypothesis testing
- ADF Testing
Questions:
- Is the train dataset complete (has all the required dates)?
- Which dates have the lowest and highest sales for each year?
- Did the earthquake impact sales?
- Are certain groups of stores making more sales than others? (Cluster, city, state, type)
- Are sales affected by promotions, oil prices and holidays?
- What analysis can we get from the date and its extractable features?
- What is the difference between RMSLE, RMSE, MSE (or why is the MAE greater than all of them?)
- What is the total sales made each year by the corporation?
Visualization Tools:
- Matplotlib
- Seaborn
- Data Splitting
- Feature Encoding with OneHotEncoder
- Feature Scaling with MinMaxScaler
- Linear Regression
- XGBoost
- CatBoost
- AutoReg
- ARIMA
- SARIMA
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Squared Logarithmic Error (MSLE)
- Root Mean Squared Logarithmic Error (RMSLE)
- RandomSearchCV
- os
- pickle
- save_model
To get started with this project, you'll need to install the following Python packages using pip
:
pip install pyodbc sqlalchemy lightgbm catboost python-dotenv pandas numpy matplotlib seaborn scipy pmdarima
Make sure to have these packages installed before running the project. Follow these steps for installation:
- Clone this repository to your local machine.
- Install the required Python packages using pip:
pip install -r requirements.txt
You're now ready to dive into this exciting data journey!
This project was built by Chidiebere David Ogbonna.
Here are his details:
Detail | Link |
---|---|
eberedavid326@gmail.com | |
chidieberedavidogbonna | |
GitHub | iameberedavid |
Medium | eberedavid |
iameberedavid |
I went further to explain the work process of this project in an article published on Medium. Click on this link to find the article: TIME SERIES FORECASTING ANALYSIS FOR CORPORATION FAVORITA.
The model was embedded into a streamlit app and deployed for public usage. This was done in this GitHub repository: Embed-Corporation-Favorita-Timeseries-Model-To-Streamlit. The deployment process was discussed in this medium article: BUILDING A USER-FRIENDLY SALES PREDICTION APP USING STREAMLIT AND HUGGINGFACE.
This project is licensed under the MIT License. See the LICENSE file for details.
I would like to express my gratitude to the Azubi Africa Data Analyst Program for offering valuable projects as part of this program. Not forgeting my scrum masters Rachel Appiah-Kubi & Emmanuel Koupoh for their support throughout this program.
Feel free to send your reviews, suggestions, questions and collaboration requests to eberedavid326@gmail.com