Case iFood - Data Analyst Case

The Company

Consider a well-established company operating in the retail food sector. Presently they have around several hundred thousand registered customers and serve almost one million consumers a year. They sell products from 5 major categories: wines, rare meat products, exotic fruits, specially prepared fish and sweet products. These can further be divided into gold and regular products. The customers can order and acquire products through 3 sales channels: physical stores, catalogs and company’s website. Globally, the company had solid revenues and a healthy bottom line in the past 3 years, but the profit growth perspectives for the next 3 years are not promising... For this reason, several strategic initiatives are being considered to invert this situation. One is to improve the performance of marketing activities, with a special focus on marketing campaigns.

The original text is here: PDF

Project with the aim of improving my skills in machine learning, classification and clustering. Based on the selection process for iFood Data Analyst available neste repositório.

Objetives

The objective of this project is to train how to structure a project in a case applied to real life.

During the analysis it was possible to note the importance of carrying out exploratory data analysis as well as preprocessing.

Detailed objectives:

Build a robust exploratory analysis.
Segment customers from the provided database.
Build a classification model to predict whether a customer will purchase the product offered in the campaign.
Present a Data Science project structure, using notebooks, scripts, reports and a repository on GitHub.
Present good programming practices in Python, such as the use of functions and script files to facilitate code reuse.
Show good practices for using SciKit-Learn, such as the use of pipelines and hyperparameter optimization.

Repository structure


├── .gitignore          <- Files and directories to ignore by Git
├── environment.yml     <- The requirements file to reproduce the analysis environment
├── LICENSE             <- LICENSE
├── README.md           <- README main for developers using this project.
|
├── data                <- Data files for the project.
|
├── model               <- Trained and serialized models, model predictions or models summaries
|
├── notebooks           <- Jupyter notebooks. The naming convention is a number (for sorting)
│
|   └──src              <- Source code for use in this project.
|      │
|      ├── __init__.py  <- Makes a Python module
|      ├── config.py    <- Basic project settings
|      └── graphics.py  <- Scripts for creating exploratory and results-oriented visualizations
|
|
├── references          <- Data dictionaries, manuals and all other explanatory materials.
|
|── images              <- Graphs and figures generated for use in reports

Details of the dataset used and summary of the results

A detailed description of the dataset used is available here.

With a pipeline with preprocessing, PCA and K-Means, the base was segmented into 3 clusters:

Viewing Cluster Profiles:

Cluster analysis:

Cluster 0:

High income
High spending
Very likely to have no children
More likely to accept campaigns
Cluster without people with basic education
No age profile that stands out

Cluster 1:

Low income
Low spending
Probably to have children
Low propensity to accept campaigns
Only cluster with a significant percentage of people with basic education
Younger people

Cluster 2:

Intermediate income
Intermediate spending
Probably to have children
May accept campaigns
Older people

Subsequently, three classification models were trained to predict whether a customer will purchase the product offered in the campaign. The models used were:

Logistic Regression Decision Tree KNN A DummyClassifier was used as a baseline. The models were compared based on 6 metrics:

Based on this comparison, the Logistic Regression model was chosen to undergo hyperparameter optimization.

How to reproduce the project

The project was developed using Python 3.13.2. To reproduce the project, create a virtual environment with Conda, or a similar tool, with Python 3.13.2 and install the libraries below:

Versions of the packages:


 Package             |  Version  
-------------------- | ----------
Imbalanced-Learn     |     0.13.0
Matplotlib           |     3.10.1
NumPy                |      2.2.4
Pandas               |      2.2.3
Scikit-Learn         |      1.6.1
Seaborn              |     0.13.2

Python version: 3.13.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Case iFood - Data Analyst Case

Objetives

Repository structure

Details of the dataset used and summary of the results

How to reproduce the project

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
images		images
notebooks		notebooks
references		references
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

jnjunior-96/ifood-data-analyst-case

Folders and files

Latest commit

History

Repository files navigation

Case iFood - Data Analyst Case

Objetives

Repository structure

Details of the dataset used and summary of the results

How to reproduce the project

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages