Skip to content

Welcome to the Time Series Forecasting Project. The mission is to predict store sales using data from Corporation Favorita, a major grocery retailer in Ecuador, using Machine Learning techniques.

License

Notifications You must be signed in to change notification settings

iameberedavid/Time-Series-Forecasting-Analysis-For-Corporation-Favorita

Repository files navigation

Time Series Forecasting Analysis For Corporation Favorita ⏰

Welcome to the Time Series Forecasting Project. The mission is to predict store sales using data from Corporation Favorita, a major grocery retailer in Ecuador, using Machine Learning techniques.

Python Version Data Analysis Data Visualization Hypothesis Testing Machine Learning

Preview 🔍

Below is a preview showcasing some features of the notebook.

Top

Top

Middle

Middle

Bottom

Bottom

Project Description 📋

The primary focus of this project is to utilize time series regression analysis to forecast sales for Corporation Favorita, a prominent grocery retailer based in Ecuador.

The objective is to develop a robust model capable of accurately forecasting future sales by leveraging the extensive time series data of thousands of products sold across various Corporation Favorita locations. The resulting forecasts will provide valuable insights to the store's management, enabling them to formulate effective inventory and sales plans.

I will utilize the CRISP-DM framework to execute the project.

Project Overview 🖼

The project includes the following stages:

1. 📥 Data Collection

  • Azubi Africa's SQL Server database, GitHub repository and OneDrive.

N/B: The datasets from the sources above where saved in the root folder of this repository for easy access

2. 📚 Data Loading

  • Utilizing pyodbc for SQL data
  • Leveraging pandas for CSV and Excel files

3. 💡 Exploratory Data Analysis (EDA)

  • Merging
  • Duplicate checks
  • Handling missing values
  • Renaming Columns and Changing Datatypes
  • Creating new features
  • Visualizations
  • Hypothesis testing
  • ADF Testing

4. 📈 Answering Questions with Visualizations

Questions:

  • Is the train dataset complete (has all the required dates)?
  • Which dates have the lowest and highest sales for each year?
  • Did the earthquake impact sales?
  • Are certain groups of stores making more sales than others? (Cluster, city, state, type)
  • Are sales affected by promotions, oil prices and holidays?
  • What analysis can we get from the date and its extractable features?
  • What is the difference between RMSLE, RMSE, MSE (or why is the MAE greater than all of them?)
  • What is the total sales made each year by the corporation?

Visualization Tools:

  • Matplotlib
  • Seaborn

5. ⚙️ Feature Engineering

  • Data Splitting
  • Feature Encoding with OneHotEncoder
  • Feature Scaling with MinMaxScaler

6. ⌛ Model Training

  • Linear Regression
  • XGBoost
  • CatBoost
  • AutoReg
  • ARIMA
  • SARIMA

7. 📊 Model Evaluation

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • Mean Squared Logarithmic Error (MSLE)
  • Root Mean Squared Logarithmic Error (RMSLE)

8. 🎯 Hyperparameter Tuning

  • RandomSearchCV

9. 🤔 Prediction on Validation Set

10. 📒 Prediction on Test Dataset

11. 💭 Exportation

  • os
  • pickle
  • save_model

Installation and Setup 🔧

To get started with this project, you'll need to install the following Python packages using pip:

pip install pyodbc sqlalchemy lightgbm catboost python-dotenv pandas numpy matplotlib seaborn scipy pmdarima

Make sure to have these packages installed before running the project. Follow these steps for installation:

  1. Clone this repository to your local machine.
  2. Install the required Python packages using pip:
    pip install -r requirements.txt

You're now ready to dive into this exciting data journey!

Author 👨‍💼

This project was built by Chidiebere David Ogbonna.

Here are his details:

Author

Detail Link
Email eberedavid326@gmail.com
LinkedIn chidieberedavidogbonna
GitHub iameberedavid
Medium eberedavid
Twitter iameberedavid

Article Publication 📄

I went further to explain the work process of this project in an article published on Medium. Click on this link to find the article: TIME SERIES FORECASTING ANALYSIS FOR CORPORATION FAVORITA.

App Deployment 📲

The model was embedded into a streamlit app and deployed for public usage. This was done in this GitHub repository: Embed-Corporation-Favorita-Timeseries-Model-To-Streamlit. The deployment process was discussed in this medium article: BUILDING A USER-FRIENDLY SALES PREDICTION APP USING STREAMLIT AND HUGGINGFACE.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments 🙏

I would like to express my gratitude to the Azubi Africa Data Analyst Program for offering valuable projects as part of this program. Not forgeting my scrum masters Rachel Appiah-Kubi & Emmanuel Koupoh for their support throughout this program.

Contact

Feel free to send your reviews, suggestions, questions and collaboration requests to eberedavid326@gmail.com

About

Welcome to the Time Series Forecasting Project. The mission is to predict store sales using data from Corporation Favorita, a major grocery retailer in Ecuador, using Machine Learning techniques.

Topics

Resources

License

Stars

Watchers

Forks