Favorita Product Demand Forecasting

A machine learning project with multivariate time series to forecast sales, built with Python and scikit-learn, designed for corporation favorita and retail companies aiming to forecast sales and optimize product inventory.

Overview

Corporation Favorita aims to optimize its inventory management by accurately forecasting the demand for various products across its stores in Ecuador. The goal is to ensure that each store has the right quantity of products in stock to meet customer demand while minimizing overstocking or stockouts.

Key Objectives

The objective is to build machine learning models that can predict unit sales for different product families at Favorita stores accurately. These models will help optimize inventory levels, improve sales forecasting accuracy, and ultimately enhance customer satisfaction by ensuring product availability.

Framework

The CRoss Industry Standard Process for Data Mining (CRISP-DM).

Features

Jupyter Notebook containing data analysis, visualizations, and interpretation.
Detailed documentation outlining methodology, data sources, and analysis results.
Interactive visualizations in Power BI and Dash showcasing sales trends and key insights.

Dash app

Favorita Product Demand Forecasting

PowerBI Dashboard

Technologies Used

Anaconda
PowerBI
Python
Polars
Dash
Plotly
plotly_resampler
Jupyter Notebooks
Git
Scipy
Sklearn
Xgboost
LinearRegression
HistGradientBoostingRegressor
Prophet
Pyodbc
Re
Typing
Kaleido

Installation

Recommended install

conda env create -f favorita_env.yml

Code Snippet- used to create date features

def create_date_features(df: pl.DataFrame)-> pl.DataFrame:
    # To date
    df = to_date(df, 'date')
    
    # Year
    # Extracting the year from the date and adding it to the df as a new column
    df = df.with_columns(pl.col('date').dt.year().alias('year'))
    
    # Extracting the ordinal day from the date and adding it to the df as a new column
    df = df.with_columns(pl.col('date').dt.ordinal_day().alias('day_of_year'))
    
    # Extracting the year start bool from the date and adding it to the df as a new column
    df = df.with_columns((pl.col('day_of_year')==1).alias('is_year_start'))
    
    # Extracting the year end bool from the date and adding it to the df as a new column
    df = df.with_columns((pl.col('day_of_year')==31).alias('is_year_end'))
    
    # Month
    # Extracting the month from the date and adding it to the df as a new column (1-12)
    df = df.with_columns(pl.col('date').dt.month().alias('month'))
    
    # Extracting the month start bool from the date and adding it to the df as a new column
    df = df.with_columns((pl.col('date')==pl.col('date').dt.month_start()).alias('is_month_start'))
    
    # Extracting the month end bool from the date and adding it to the df as a new column
    df = df.with_columns((pl.col('date')==pl.col('date').dt.month_end()).alias('is_month_end'))
    
    # Extracting the month name from the date and adding it to the df as a new column
    df = df.with_columns(pl.col('date').dt.to_string('%B').alias('month_name')) 
    
    # Day
    # Extracting the day from the date and adding it to the df as a new column
    df = df.with_columns(pl.col('date').dt.day().alias('day'))           
    
    # Extracting the day name from the date and adding it to the df as a new column
    df = df.with_columns(pl.col('date').dt.to_string('%A').alias('day_name'))
    
    # Quarter
    # Extracting the quarter from the date and adding it to the df as a new column (1-4)
    df = df.with_columns(pl.col('date').dt.quarter().alias('quarter'))
    
    # Extracting the quarter start bool from the date and adding it to the df as a new column
    df = df.with_columns((((((pl.col('quarter')-1)*3)+1)==pl.col('month')) & (pl.col('is_month_start'))).alias('is_quarter_start'))
    
    # Extracting the quarter end bool from the date and adding it to the df as a new column
    df = df.with_columns((((pl.col('quarter')*3)==pl.col('month')) & (pl.col('is_month_end'))).alias('is_quarter_end'))
    
    # Week
    # Extracting the week from the date and adding it to the df as a new column (1-52)
    df = df.with_columns(pl.col('date').dt.week().alias('week'))
    
    # Extracting the week day from the date and adding it to the df as a new column (1-7)
    df = df.with_columns(pl.col('date').dt.weekday().alias('week_day'))
    
    # Extracting the weekend bool from the date and adding it to the df as a new column
    df = df.with_columns(pl.col('week_day').is_in([6,7]).alias('is_week_end'))         
    
    return df

(back to top)

Code Snippet- used to create lag features

def create_lag_features(df: pl.DataFrame, lag_days: List[int] = [1, 6, 7], rows_per_day: int = 1782, lag_factor: int = 91) -> pl.DataFrame:
    # Initialize a list to hold the lag columns
    lag_columns = []

    # Iterate to create each lag feature, fill nulls with zero since they are not known or are probably zero if the first date is the inception
    for i in lag_days:
        lag_col = df['sales'].shift(i*rows_per_day*lag_factor).alias(f'lag_{i}').fill_null(strategy='zero')
        lag_columns.append(lag_col)

    # Combine the original df with the lag columns. Drop rows with null values in lag columns if they were not filled, probably redundant since they were filled
    lagged_df = df.with_columns(lag_columns).drop_nulls([col.name for col in lag_columns])

    return lagged_df

(back to top)

Code Snippet- used to create all features

def create_all_features(df):
    if not isinstance(df, pl.DataFrame): # Handle arrays inputs from pipeline
        schema = ['id', 'store_nbr', 'family', 'sales', 'onpromotion', 'city', 'state', 'store_type', 'cluster', 
              'promotion', 'holiday', 'year', 'day_of_year', 'is_year_start', 'is_year_end', 'month', 'is_month_start', 
              'is_month_end', 'month_name', 'day', 'day_name', 'quarter', 'is_quarter_start', 'is_quarter_end', 'week', 
              'week_day', 'is_week_end', 'lag_1', 'lag_6', 'lag_7', 'oil_price']
    
        df = pl.DataFrame(data=df, schema=schema)
        
    return column_dropper(
        create_oil_feature(
            create_lag_features(
                create_date_features(
                    create_holiday_feature(
                        create_promotion_feature(
                            create_store_features(fix_col_types(df))
                        )
                    )
                )
            )
        )
    )

(back to top)

Contributions

How to Contribute

Fork the repository and clone it to your local machine.
Explore the Jupyter Notebooks and documentation.
Implement enhancements, fix bugs, or propose new features.
Submit a pull request with your changes, ensuring clear descriptions and documentation.
Participate in discussions, provide feedback, and collaborate with the community.

(back to top)

Feedback and Support

Feedback, suggestions, and contributions are welcome! Feel free to open an issue for bug reports, feature requests, or general inquiries. For additional support or questions, you can connect with me on LinkedIn.

Link to article on Medium: From Meeting Product Demands to transforming Inventory Management: Forecasting Sales in Ecuador's Leading Retailer, Corporación Favorita.

(back to top)

👥 Authors

🕺🏻Gabriel Okundaye

GitHub: GitHub Profile
LinkedIn: LinkedIn Profile

(back to top)

⭐️ Show your support

If you like this project kindly show some love, give it a 🌟 STAR 🌟. Thank you!

(back to top)

📝 License

This project is MIT licensed.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
assets		assets
dash		dash
data/source		data/source
models		models
notebooks		notebooks
plots		plots
powerbi		powerbi
screenshots		screenshots
submissions		submissions
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
favorita_env.yml		favorita_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Favorita Product Demand Forecasting

Overview

Key Objectives

Framework

Features

Dash app

PowerBI Dashboard

Technologies Used

Installation

Recommended install

Code Snippet- used to create date features

Code Snippet- used to create lag features

Code Snippet- used to create all features

Contributions

How to Contribute

Feedback and Support

👥 Authors

⭐️ Show your support

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

D0nG4667/favorita_products_demand_forecasting

Folders and files

Latest commit

History

Repository files navigation

Favorita Product Demand Forecasting

Overview

Key Objectives

Framework

Features

Dash app

PowerBI Dashboard

Technologies Used

Installation

Recommended install

Code Snippet- used to create date features

Code Snippet- used to create lag features

Code Snippet- used to create all features

Contributions

How to Contribute

Feedback and Support

👥 Authors

⭐️ Show your support

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages