This project involves the creation of a Data Warehouse (DW) to support a decision-making system for analyzing the management of products in a commerce setting. The process integrates data from multiple sources into a unified warehouse using the ETL (Extract, Transform, Load) process, implemented in Python. The data is then analyzed and visualized using Power BI for actionable insights.
This is the Schema Data WareHouse :
The main objective is to enable the analysis of key indicators such as:
- The number of supplier invoices by year (based on product purchase date).
- The cost of a product, calculated as:
Cost of Product = Montant_Produit * Qt_Produit
The project integrates data from the following files:
- fournisseur.csv: Contains supplier information.
- produits.xls: Details of products.
- cout_produit.csv: Product costs.
- Fournisseur_facture.csv: Supplier invoice data.
The ETL process involves three stages:
Data is extracted from the input files mentioned above using pandas.
Data cleaning, normalization, and transformation are performed to align with the data warehouse schema:
- Creation of dimension tables:
dim_fournisseur
(Supplier Dimension)dim_produit
(Product Dimension)dim_temps
(Time Dimension)
- Creation of the fact table:
fact_ventes
(Sales Fact Table)
The transformed data is loaded into a MySQL database using SQLAlchemy.
- pandas: Data manipulation and transformation.
- SQLAlchemy: Database connection and data loading.
-
Setup MySQL Database
Ensure MySQL is installed and running. Create a database namedbi_data_warehouse
. -
Install Required Libraries
Usepip
to install the required Python libraries:pip install pandas sqlalchemy
-
Run the ETL Script
Execute the Python script to perform the ETL process and load the data into the database. -
Verify Database Content
Check the MySQL database to ensure all tables (dim_fournisseur
,dim_produit
,dim_temps
,fact_ventes
) are populated with the expected data.
The data loaded into the MySQL database is analyzed and visualized using Power BI.
ScreenShots :
The project successfully integrates diverse data sources into a unified data warehouse, enabling efficient reporting and decision-making for product management. The visual insights generated in Power BI empower stakeholders to monitor and optimize key performance indicators effectively.
Feel free to reach out for any questions or enhancements!