Skip to content

Data-driven business process optimization project: delay analysis, bottleneck detection, and profitability diagnostics using Python and SQLite, with actionable insights and clean, professional reporting.

Notifications You must be signed in to change notification settings

KeyanEnayati/Business-Process-Optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

📊 Business Process Optimization Using Python & SQLite

📑 Table of Contents


📌 Project Overview

This project presents a comprehensive business process optimization and operational performance analysis using Python and SQLite. This project dives deep into:

  • End-to-end delay analysis across the fulfillment pipeline (from procurement to delivery)
  • Diagnostics of fulfillment lags at each process stage
  • Identification of bottlenecks by product, confirming delays are systemic rather than isolated to specific warehouses or branches
  • Order-level profitability analysis to test whether fulfillment delays impact gross margins
  • Downstream performance indicators for business process improvement

All analysis is performed using Python and SQLite . Visuals are built using matplotlib, seaborn, and plotly. The dataset represents 3 years (2018–2020) of transactional data.


📁 Step 1. Dataset & Table Structure

We simulate an SQL table named sales_data with the following structure:

CREATE TABLE sales_data (
  InvoiceNumber TEXT,
  SalesChannel TEXT,
  WarehouseLocationCode TEXT,
  ProcurementDate TEXT,
  SalesOrderDate TEXT,
  DispatchDate TEXT,
  DeliveryDate TEXT,
  Currency TEXT,
  SalesRepID INTEGER,
  CustomerID INTEGER,
  BranchCode TEXT,
  ProductCode INTEGER,
  QuantityOrdered INTEGER,
  DiscountRate REAL,
  UnitSalePrice REAL,
  UnitProductionCost REAL
);

Each row captures the full cycle of an order — from procurement to dispatch to delivery — including costs, quantities, and prices.

🧹 Step 2. Data Cleaning in SQLite

All cleaning is executed directly within SQLite to ensure optimal data integrity before loading into Python for analysis.

🔽 2.1 Remove NULLs in Critical Fields

DELETE FROM sales_data
WHERE
  InvoiceNumber IS NULL OR
  SalesOrderDate IS NULL OR
  CustomerID IS NULL OR
  ProductCode IS NULL;

🔽 2.2 Remove Rows with Zero or Negative Price/Quantity

DELETE FROM sales_data
WHERE
  UnitSalePrice IS NULL OR UnitSalePrice <= 0 OR
  QuantityOrdered IS NULL OR QuantityOrdered <= 0;

🔽 2.3 Keep Only IRR Currency

DELETE FROM sales_data
WHERE Currency != 'IRR';

🔽 2.4 Keep Only Orders Between 2018 and 2020

DELETE FROM sales_data
WHERE
  SalesOrderDate < '2018-01-01' OR
  SalesOrderDate > '2020-12-31';

🔽 2.5 Trim Whitespace in Key Text Fields

UPDATE sales_data
SET
  InvoiceNumber = TRIM(InvoiceNumber),
  SalesChannel = TRIM(SalesChannel),
  WarehouseLocationCode = TRIM(WarehouseLocationCode),
  BranchCode = TRIM(BranchCode);

🔽 2.6 Remove Duplicate Rows

DELETE FROM sales_data
WHERE rowid NOT IN (
  SELECT MIN(rowid)
  FROM sales_data
  GROUP BY InvoiceNumber, CustomerID, ProductCode
);

🔽 2.7 Create Cleaned Table for Analysis

CREATE TABLE sales_data_cleaned AS
SELECT * FROM sales_data;

⏱ Step 3. Delay & Lag Analysis

Now that the cleaned dataset has been loaded into Python, we begin by calculating the key time intervals across the order lifecycle:

  • Time between procurement and sales order
  • Time between sales order and dispatch
  • Time between dispatch and delivery
  • Full fulfillment time (procurement → delivery)

These metrics help identify inefficiencies, bottlenecks, and inconsistencies across products, branches, and warehouses.

🔽 3.1 Load Cleaned Data in Python

import pandas as pd

# Load cleaned data into a DataFrame
df = pd.read_csv("sales_data_cleaned.csv")

# Convert dates
date_cols = ['ProcurementDate', 'SalesOrderDate', 'DispatchDate', 'DeliveryDate']
for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

🔽 3.2 Create Lag Columns

df['Procurement_to_Order'] = (df['SalesOrderDate'] - df['ProcurementDate']).dt.days
df['Order_to_Dispatch'] = (df['DispatchDate'] - df['SalesOrderDate']).dt.days
df['Dispatch_to_Delivery'] = (df['DeliveryDate'] - df['DispatchDate']).dt.days
df['Total_Fulfillment_Time'] = (df['DeliveryDate'] - df['ProcurementDate']).dt.days

🔽 3.3 Basic Lag Statistics

df[['Procurement_to_Order', 'Order_to_Dispatch', 'Dispatch_to_Delivery', 'Total_Fulfillment_Time']].describe()
Procurement_to_Order Order_to_Dispatch Dispatch_to_Delivery Total_Fulfillment_Time
count 7991.000000 7991.000000 7991.000000 7991.000000
mean 109.243149 15.168940 5.503942 129.916031
std 34.371490 7.793951 2.855809 35.297567
min 31.000000 2.000000 1.000000 40.000000
25% 83.000000 8.000000 3.000000 103.000000
50% 108.000000 15.000000 5.000000 129.000000
75% 135.000000 22.000000 8.000000 156.000000
max 189.000000 28.000000 10.000000 219.000000

📋 3.4 Summary & Key Takeaway

Our lag analysis revealed that the biggest bottleneck in the business process is the time between procurement and sales order — a whopping 109 days on average! This long lead time dwarfs the delays in order fulfillment and delivery, which are both relatively quick (just 15 days and 5 days, respectively).

📊 3.5 Visualizing Fulfillment Lag Distributions

Understanding the overall spread and range of each lag metric is essential to see if delays are normal, clustered, or if there are frequent outliers.
Here, we use histograms to quickly spot if most orders cluster around a typical value or if there’s a long “tail” of unusually slow steps.

import matplotlib.pyplot as plt
import seaborn as sns

lag_columns = ['Procurement_to_Order', 'Order_to_Dispatch', 'Dispatch_to_Delivery', 'Total_Fulfillment_Time']

plt.figure(figsize=(16, 8))
for i, col in enumerate(lag_columns, 1):
    plt.subplot(2, 2, i)
    sns.histplot(df[col], kde=True, bins=30, color='skyblue')
    plt.title(f' Distribution of {col.replace("_", " ")}')
    plt.xlabel('Days')
plt.tight_layout()
plt.show()

png

📝 Insight:

Most of the process is quite predictable—there are no strong outliers or extreme delays visible in the histograms. However, the main opportunity for improvement is in the procurement-to-order lag, which dominates the total fulfillment time.
We’ll next use boxplots to further confirm if any individual extreme outliers are present.

📦 3.6 Boxplots of Fulfillment Lags

Boxplots provide a quick way to spot outliers and compare the overall spread of each process step. This helps us see if a few rare cases are driving up the average, or if the delays are common across most orders.

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(14, 5))
sns.boxplot(data=df[lag_columns], orient='h', palette='pastel')
plt.title('Boxplot of Fulfillment Lag Metrics')
plt.xlabel('Days')
plt.show()

png

📝 Insight:

  • Procurement to Order and Total Fulfillment Time both show a wide spread but no extreme outliers. This means the long delays are common across the majority of orders, not caused by a few rare cases.
  • Order to Dispatch and Dispatch to Delivery are tightly clustered, confirming that once an order is placed, fulfillment and delivery are fast and consistent.
  • No individual lag step shows significant outliers—delays are a general, process-wide feature rather than isolated incidents.

💸 Step 4. Profitability & Margin Diagnostics

Having identified long fulfillment delays (Step 3), our next question is:
Are these delays actually hurting our margins, or is profitability stable across the business?


4.1 Order-Level Profitability

We calculate gross profit and margin for each order:

df['Revenue'] = df['UnitSalePrice'] * df['QuantityOrdered']
df['ProductionCost'] = df['UnitProductionCost'] * df['QuantityOrdered']
df['GrossProfit'] = df['Revenue'] - df['ProductionCost']
df['GrossMarginPercent'] = (df['GrossProfit'] / df['Revenue']).replace([float('inf'), -float('inf')], 0) * 100

df[['Revenue', 'ProductionCost', 'GrossProfit', 'GrossMarginPercent']].describe()
Revenue ProductionCost GrossProfit GrossMarginPercent
count 7991.000000 7991.000000 7991.000000 7991.000000
mean 10348.232587 6484.551694 3863.680892 37.369416
std 10021.420331 6556.445655 4196.856592 13.225177
min 167.500000 73.700000 25.125000 15.000000
25% 3195.900000 1833.924000 1013.509000 26.000000
50% 6994.800000 4205.992000 2398.600000 37.000000
75% 14780.200000 8975.856000 5094.144000 49.000000
max 52313.600000 42308.088000 30905.760000 60.000000

📝 Insight:

Margins are strong across almost all orders, with no sign of process-wide losses—even though orders wait a long time to move through the pipeline (Step 3). This means that reducing delays is a direct path to improved cash flow and volume, not just margin repair.


4.2 Do Longer Delays Lead to Lower Margins?

Having established that overall margins are healthy, the next question is whether longer fulfillment times are associated with lower profitability at the order level.
If longer delays lead to reduced gross margin, process improvement is not just about speed, but also about protecting profit.
We group orders into fulfillment speed quartiles and compare their average gross margins to test this relationship.

import pandas as pd

# Create fulfillment speed quartiles
df['FulfillmentSpeed'] = pd.qcut(
    df['Total_Fulfillment_Time'],
    4,
    labels=['Fastest', 'Fast', 'Slow', 'Slowest']
)
# For bar plot: Fastest vs. Slowest
grouped = df.groupby('FulfillmentSpeed',observed=False)['GrossMarginPercent'].mean().reindex(['Fastest', 'Slowest'])

grouped.plot(kind='bar', color=['seagreen', 'indianred'])
plt.title('Average Margin: Fastest vs. Slowest Fulfillment')
plt.ylabel('Average Gross Margin (%)')
plt.xlabel('Fulfillment Speed')
plt.tight_layout()
plt.show()

png

📝 Insight :

The average gross margin is nearly identical between the fastest- and slowest-fulfilled orders. This confirms that even the slowest processes are not eroding profitability. Therefore, any improvements to reduce fulfillment delays will improve business performance without risking margin—a clear win for operational change.

🏭 Step 5. Process Bottlenecks & Delay Causes

Now that we know where delays happen and that they don’t destroy profitability, our next goal is to identify the root causes behind those delays.

Purpose of this step:

  • To see if bottlenecks are tied to specific products, or other order attributes.
  • To highlight which part(s) of the business process need attention for maximum efficiency gains.

We’ll start by reviewing how lag times vary across different produc

# Example: Compare by product
product_lags = df.groupby('ProductCode')['Total_Fulfillment_Time'].mean().sort_values(ascending=False)
product_lags.head(10)  # Show top 10 slowest products
ProductCode Average_Fulfillment_Time (days)
1032 139.37
1065 137.84
1070 137.59
1000 136.55
1021 136.53
1033 136.27
1001 135.95
1046 135.42
1083 135.22
1045 135.18

Name: Total_Fulfillment_Time, dtype: float64

📝 Insight: Product-Level Bottlenecks

The analysis shows that certain products consistently have the longest fulfillment times—with top product codes averaging 135–140 days from procurement to delivery.
This pattern indicates that delays are more strongly tied to specific product lines than to location or order attributes.

📈 Step 6. Advanced Operational Metrics

With delays, bottlenecks, and profitability fully analyzed, this step explores additional metrics that provide deeper insight into the business process and highlight further opportunities for operational improvement.

Purpose of this step:

  • Quantify efficiency and reliability of order fulfillment.
  • Reveal “hidden” trends or risks not captured by margin or lag analysis alone.
  • Deliver actionable metrics that can guide next steps in business process optimization.

We will cover:

  • Repeat order rate: How many customers return for multiple purchases (a proxy for customer satisfaction and process reliability)?
  • Order volume over time: Are process changes (or external shocks) impacting throughput?
  • Discount analysis: Are high discount rates related to specific lag patterns or product lines?

6.1 Repeat Order Rate

# Percentage of customers placing more than one order
repeat_customers = (
    df.groupby('CustomerID')['InvoiceNumber']
    .nunique()
    .reset_index(name='OrderCount')
)
repeat_customer_pct = (repeat_customers['OrderCount'] > 1).mean() * 100
print(f"🔁 Repeat order rate: {repeat_customer_pct:.1f}%")
🔁 Repeat order rate: 99.9%

📝 Insight;

A high repeat order rate suggests that—despite long fulfillment lags—customers remain loyal. If the rate is low, delays may be undermining long-term relationships.

6.2 Order Volume Over Tim

# Monthly order volume
df['OrderMonth'] = pd.to_datetime(df['SalesOrderDate']).dt.to_period('M')
orders_per_month = df.groupby('OrderMonth')['InvoiceNumber'].nunique()

orders_per_month.plot(kind='bar', figsize=(10, 4), color='steelblue')
plt.title('Monthly Order Volume')
plt.ylabel('Number of Orders')
plt.xlabel('Month')
plt.tight_layout()
plt.show()

png

📝 Insight:

Order volumes remained relatively stable throughout the 2018–2020 period, with only minor fluctuations month-to-month. There are no obvious periods of sustained decline or sharp spikes, suggesting that—despite fulfillment delays—the business maintained consistent throughput. This stability indicates strong underlying demand and process resilience, though the absence of significant volume growth may point to a hidden opportunity for further improvement if fulfillment speed is increased.

6.3 Discount Rate and Lag Correlation

To determine if longer fulfillment times are being compensated by higher discounts—potentially eroding profitability.

Analysis:
We calculated the average discount rate for each fulfillment speed group (Fastest, Fast, Slow, Slowest).

print(df.groupby('FulfillmentSpeed',observed=False)['DiscountRate'].mean())
FulfillmentSpeed Average_Discount_Rate
Fastest 0.1158
Fast 0.1118
Slow 0.1155
Slowest 0.1145

Name: DiscountRate, dtype: float64

📝 Insight:

The average discount rate is identical (11%) across all fulfillment speed groups. This confirms that pricing is stable and unaffected by order processing delays.

✅ Step 7. Conclusion & Process Recommendations

Summary of Key Findings:

  • Order fulfillment is consistently slow across all products and process stages, with average total lead times over four months.
  • No single branch or warehouse is responsible for delays—bottlenecks are systemic, not location-specific.
  • Profitability remains strong and stable regardless of fulfillment speed. Delays do not erode gross margin or force higher discounts.
  • Customer loyalty is resilient: Repeat order rates are high, and discounting practices are consistent, suggesting that core business value is preserved despite lag.
  • Order volume is steady over time, indicating reliable demand and process stability, but not breakthrough growth.

Recommendations for Business Process Improvement:

  1. Prioritize process speed-ups:
    Reducing fulfillment lag will directly improve cash flow, asset turnover, and customer experience—without risking profitability.
  2. Target systemic improvements:
    Since bottlenecks are not tied to a specific branch or warehouse, focus on end-to-end process reengineering (e.g., procurement automation, inventory management, supplier integration).
  3. Leverage customer goodwill:
    Loyal customers have tolerated delays so far; faster fulfillment could unlock new growth and retention without requiring price incentives.
  4. Monitor after changes:
    Track profitability, order volume, and customer repeat rates as improvements are made to ensure performance gains are sustained and risks are controlled.

About

Data-driven business process optimization project: delay analysis, bottleneck detection, and profitability diagnostics using Python and SQLite, with actionable insights and clean, professional reporting.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published