📊 Business Process Optimization Using Python & SQLite

📑 Table of Contents

📌 Project Overview
📁 Step 1. Dataset & Table Structure
🧹 Step 2. Data Cleaning in SQLlite
⏱ Step 3. Delay & Lag Analysis
💸 Step 4. Profitability & Margin Diagnostics
🏭 Step 5. Process Bottlenecks & Delay Causes
📈 Step 6. Advanced Operational Metrics
✅ Step 7. Conclusion & Process Recommendations

📌 Project Overview

This project presents a comprehensive business process optimization and operational performance analysis using Python and SQLite. This project dives deep into:

End-to-end delay analysis across the fulfillment pipeline (from procurement to delivery)
Diagnostics of fulfillment lags at each process stage
Identification of bottlenecks by product, confirming delays are systemic rather than isolated to specific warehouses or branches
Order-level profitability analysis to test whether fulfillment delays impact gross margins
Downstream performance indicators for business process improvement

All analysis is performed using Python and SQLite . Visuals are built using matplotlib, seaborn, and plotly. The dataset represents 3 years (2018–2020) of transactional data.

📁 Step 1. Dataset & Table Structure

We simulate an SQL table named sales_data with the following structure:

CREATE TABLE sales_data (
  InvoiceNumber TEXT,
  SalesChannel TEXT,
  WarehouseLocationCode TEXT,
  ProcurementDate TEXT,
  SalesOrderDate TEXT,
  DispatchDate TEXT,
  DeliveryDate TEXT,
  Currency TEXT,
  SalesRepID INTEGER,
  CustomerID INTEGER,
  BranchCode TEXT,
  ProductCode INTEGER,
  QuantityOrdered INTEGER,
  DiscountRate REAL,
  UnitSalePrice REAL,
  UnitProductionCost REAL
);

Each row captures the full cycle of an order — from procurement to dispatch to delivery — including costs, quantities, and prices.

🧹 Step 2. Data Cleaning in SQLite

All cleaning is executed directly within SQLite to ensure optimal data integrity before loading into Python for analysis.

🔽 2.1 Remove NULLs in Critical Fields

DELETE FROM sales_data
WHERE
  InvoiceNumber IS NULL OR
  SalesOrderDate IS NULL OR
  CustomerID IS NULL OR
  ProductCode IS NULL;

🔽 2.2 Remove Rows with Zero or Negative Price/Quantity

DELETE FROM sales_data
WHERE
  UnitSalePrice IS NULL OR UnitSalePrice <= 0 OR
  QuantityOrdered IS NULL OR QuantityOrdered <= 0;

🔽 2.3 Keep Only IRR Currency

DELETE FROM sales_data
WHERE Currency != 'IRR';

🔽 2.4 Keep Only Orders Between 2018 and 2020

DELETE FROM sales_data
WHERE
  SalesOrderDate < '2018-01-01' OR
  SalesOrderDate > '2020-12-31';

🔽 2.5 Trim Whitespace in Key Text Fields

UPDATE sales_data
SET
  InvoiceNumber = TRIM(InvoiceNumber),
  SalesChannel = TRIM(SalesChannel),
  WarehouseLocationCode = TRIM(WarehouseLocationCode),
  BranchCode = TRIM(BranchCode);

🔽 2.6 Remove Duplicate Rows

DELETE FROM sales_data
WHERE rowid NOT IN (
  SELECT MIN(rowid)
  FROM sales_data
  GROUP BY InvoiceNumber, CustomerID, ProductCode
);

🔽 2.7 Create Cleaned Table for Analysis

CREATE TABLE sales_data_cleaned AS
SELECT * FROM sales_data;

⏱ Step 3. Delay & Lag Analysis

Now that the cleaned dataset has been loaded into Python, we begin by calculating the key time intervals across the order lifecycle:

Time between procurement and sales order
Time between sales order and dispatch
Time between dispatch and delivery
Full fulfillment time (procurement → delivery)

These metrics help identify inefficiencies, bottlenecks, and inconsistencies across products, branches, and warehouses.

🔽 3.1 Load Cleaned Data in Python

import pandas as pd

# Load cleaned data into a DataFrame
df = pd.read_csv("sales_data_cleaned.csv")

# Convert dates
date_cols = ['ProcurementDate', 'SalesOrderDate', 'DispatchDate', 'DeliveryDate']
for col in date_cols:
    df[col] = pd.to_datetime(df[col], errors='coerce')

🔽 3.2 Create Lag Columns

df['Procurement_to_Order'] = (df['SalesOrderDate'] - df['ProcurementDate']).dt.days
df['Order_to_Dispatch'] = (df['DispatchDate'] - df['SalesOrderDate']).dt.days
df['Dispatch_to_Delivery'] = (df['DeliveryDate'] - df['DispatchDate']).dt.days
df['Total_Fulfillment_Time'] = (df['DeliveryDate'] - df['ProcurementDate']).dt.days

🔽 3.3 Basic Lag Statistics

df[['Procurement_to_Order', 'Order_to_Dispatch', 'Dispatch_to_Delivery', 'Total_Fulfillment_Time']].describe()

	Procurement_to_Order	Order_to_Dispatch	Dispatch_to_Delivery	Total_Fulfillment_Time
count	7991.000000	7991.000000	7991.000000	7991.000000
mean	109.243149	15.168940	5.503942	129.916031
std	34.371490	7.793951	2.855809	35.297567
min	31.000000	2.000000	1.000000	40.000000
25%	83.000000	8.000000	3.000000	103.000000
50%	108.000000	15.000000	5.000000	129.000000
75%	135.000000	22.000000	8.000000	156.000000
max	189.000000	28.000000	10.000000	219.000000

📋 3.4 Summary & Key Takeaway

Our lag analysis revealed that the biggest bottleneck in the business process is the time between procurement and sales order — a whopping 109 days on average! This long lead time dwarfs the delays in order fulfillment and delivery, which are both relatively quick (just 15 days and 5 days, respectively).

📊 3.5 Visualizing Fulfillment Lag Distributions

Understanding the overall spread and range of each lag metric is essential to see if delays are normal, clustered, or if there are frequent outliers.
Here, we use histograms to quickly spot if most orders cluster around a typical value or if there’s a long “tail” of unusually slow steps.

import matplotlib.pyplot as plt
import seaborn as sns

lag_columns = ['Procurement_to_Order', 'Order_to_Dispatch', 'Dispatch_to_Delivery', 'Total_Fulfillment_Time']

plt.figure(figsize=(16, 8))
for i, col in enumerate(lag_columns, 1):
    plt.subplot(2, 2, i)
    sns.histplot(df[col], kde=True, bins=30, color='skyblue')
    plt.title(f' Distribution of {col.replace("_", " ")}')
    plt.xlabel('Days')
plt.tight_layout()
plt.show()

📝 Insight:

Most of the process is quite predictable—there are no strong outliers or extreme delays visible in the histograms. However, the main opportunity for improvement is in the procurement-to-order lag, which dominates the total fulfillment time.
We’ll next use boxplots to further confirm if any individual extreme outliers are present.

📦 3.6 Boxplots of Fulfillment Lags

Boxplots provide a quick way to spot outliers and compare the overall spread of each process step. This helps us see if a few rare cases are driving up the average, or if the delays are common across most orders.

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(14, 5))
sns.boxplot(data=df[lag_columns], orient='h', palette='pastel')
plt.title('Boxplot of Fulfillment Lag Metrics')
plt.xlabel('Days')
plt.show()

📝 Insight:

Procurement to Order and Total Fulfillment Time both show a wide spread but no extreme outliers. This means the long delays are common across the majority of orders, not caused by a few rare cases.
Order to Dispatch and Dispatch to Delivery are tightly clustered, confirming that once an order is placed, fulfillment and delivery are fast and consistent.
No individual lag step shows significant outliers—delays are a general, process-wide feature rather than isolated incidents.

💸 Step 4. Profitability & Margin Diagnostics

Having identified long fulfillment delays (Step 3), our next question is:
Are these delays actually hurting our margins, or is profitability stable across the business?

4.1 Order-Level Profitability

We calculate gross profit and margin for each order:

df['Revenue'] = df['UnitSalePrice'] * df['QuantityOrdered']
df['ProductionCost'] = df['UnitProductionCost'] * df['QuantityOrdered']
df['GrossProfit'] = df['Revenue'] - df['ProductionCost']
df['GrossMarginPercent'] = (df['GrossProfit'] / df['Revenue']).replace([float('inf'), -float('inf')], 0) * 100

df[['Revenue', 'ProductionCost', 'GrossProfit', 'GrossMarginPercent']].describe()

	Revenue	ProductionCost	GrossProfit	GrossMarginPercent
count	7991.000000	7991.000000	7991.000000	7991.000000
mean	10348.232587	6484.551694	3863.680892	37.369416
std	10021.420331	6556.445655	4196.856592	13.225177
min	167.500000	73.700000	25.125000	15.000000
25%	3195.900000	1833.924000	1013.509000	26.000000
50%	6994.800000	4205.992000	2398.600000	37.000000
75%	14780.200000	8975.856000	5094.144000	49.000000
max	52313.600000	42308.088000	30905.760000	60.000000

📝 Insight:

Margins are strong across almost all orders, with no sign of process-wide losses—even though orders wait a long time to move through the pipeline (Step 3). This means that reducing delays is a direct path to improved cash flow and volume, not just margin repair.

4.2 Do Longer Delays Lead to Lower Margins?

Having established that overall margins are healthy, the next question is whether longer fulfillment times are associated with lower profitability at the order level.
If longer delays lead to reduced gross margin, process improvement is not just about speed, but also about protecting profit.
We group orders into fulfillment speed quartiles and compare their average gross margins to test this relationship.

import pandas as pd

# Create fulfillment speed quartiles
df['FulfillmentSpeed'] = pd.qcut(
    df['Total_Fulfillment_Time'],
    4,
    labels=['Fastest', 'Fast', 'Slow', 'Slowest']
)

# For bar plot: Fastest vs. Slowest
grouped = df.groupby('FulfillmentSpeed',observed=False)['GrossMarginPercent'].mean().reindex(['Fastest', 'Slowest'])

grouped.plot(kind='bar', color=['seagreen', 'indianred'])
plt.title('Average Margin: Fastest vs. Slowest Fulfillment')
plt.ylabel('Average Gross Margin (%)')
plt.xlabel('Fulfillment Speed')
plt.tight_layout()
plt.show()

📝 Insight :

The average gross margin is nearly identical between the fastest- and slowest-fulfilled orders. This confirms that even the slowest processes are not eroding profitability. Therefore, any improvements to reduce fulfillment delays will improve business performance without risking margin—a clear win for operational change.

🏭 Step 5. Process Bottlenecks & Delay Causes

Now that we know where delays happen and that they don’t destroy profitability, our next goal is to identify the root causes behind those delays.

Purpose of this step:

To see if bottlenecks are tied to specific products, or other order attributes.
To highlight which part(s) of the business process need attention for maximum efficiency gains.

We’ll start by reviewing how lag times vary across different produc

# Example: Compare by product
product_lags = df.groupby('ProductCode')['Total_Fulfillment_Time'].mean().sort_values(ascending=False)
product_lags.head(10)  # Show top 10 slowest products

ProductCode	Average_Fulfillment_Time (days)
1032	139.37
1065	137.84
1070	137.59
1000	136.55
1021	136.53
1033	136.27
1001	135.95
1046	135.42
1083	135.22
1045	135.18

Name: Total_Fulfillment_Time, dtype: float64

📝 Insight: Product-Level Bottlenecks

The analysis shows that certain products consistently have the longest fulfillment times—with top product codes averaging 135–140 days from procurement to delivery.
This pattern indicates that delays are more strongly tied to specific product lines than to location or order attributes.

📈 Step 6. Advanced Operational Metrics

With delays, bottlenecks, and profitability fully analyzed, this step explores additional metrics that provide deeper insight into the business process and highlight further opportunities for operational improvement.

Purpose of this step:

Quantify efficiency and reliability of order fulfillment.
Reveal “hidden” trends or risks not captured by margin or lag analysis alone.
Deliver actionable metrics that can guide next steps in business process optimization.

We will cover:

Repeat order rate: How many customers return for multiple purchases (a proxy for customer satisfaction and process reliability)?
Order volume over time: Are process changes (or external shocks) impacting throughput?
Discount analysis: Are high discount rates related to specific lag patterns or product lines?

6.1 Repeat Order Rate

# Percentage of customers placing more than one order
repeat_customers = (
    df.groupby('CustomerID')['InvoiceNumber']
    .nunique()
    .reset_index(name='OrderCount')
)
repeat_customer_pct = (repeat_customers['OrderCount'] > 1).mean() * 100
print(f"🔁 Repeat order rate: {repeat_customer_pct:.1f}%")

🔁 Repeat order rate: 99.9%

📝 Insight;

A high repeat order rate suggests that—despite long fulfillment lags—customers remain loyal. If the rate is low, delays may be undermining long-term relationships.

6.2 Order Volume Over Tim

# Monthly order volume
df['OrderMonth'] = pd.to_datetime(df['SalesOrderDate']).dt.to_period('M')
orders_per_month = df.groupby('OrderMonth')['InvoiceNumber'].nunique()

orders_per_month.plot(kind='bar', figsize=(10, 4), color='steelblue')
plt.title('Monthly Order Volume')
plt.ylabel('Number of Orders')
plt.xlabel('Month')
plt.tight_layout()
plt.show()

📝 Insight:

Order volumes remained relatively stable throughout the 2018–2020 period, with only minor fluctuations month-to-month. There are no obvious periods of sustained decline or sharp spikes, suggesting that—despite fulfillment delays—the business maintained consistent throughput. This stability indicates strong underlying demand and process resilience, though the absence of significant volume growth may point to a hidden opportunity for further improvement if fulfillment speed is increased.

6.3 Discount Rate and Lag Correlation

To determine if longer fulfillment times are being compensated by higher discounts—potentially eroding profitability.

Analysis:
We calculated the average discount rate for each fulfillment speed group (Fastest, Fast, Slow, Slowest).

print(df.groupby('FulfillmentSpeed',observed=False)['DiscountRate'].mean())

FulfillmentSpeed	Average_Discount_Rate
Fastest	0.1158
Fast	0.1118
Slow	0.1155
Slowest	0.1145

Name: DiscountRate, dtype: float64

📝 Insight:

The average discount rate is identical (11%) across all fulfillment speed groups. This confirms that pricing is stable and unaffected by order processing delays.

✅ Step 7. Conclusion & Process Recommendations

Summary of Key Findings:

Order fulfillment is consistently slow across all products and process stages, with average total lead times over four months.
No single branch or warehouse is responsible for delays—bottlenecks are systemic, not location-specific.
Profitability remains strong and stable regardless of fulfillment speed. Delays do not erode gross margin or force higher discounts.
Customer loyalty is resilient: Repeat order rates are high, and discounting practices are consistent, suggesting that core business value is preserved despite lag.
Order volume is steady over time, indicating reliable demand and process stability, but not breakthrough growth.

Recommendations for Business Process Improvement:

Prioritize process speed-ups:
Reducing fulfillment lag will directly improve cash flow, asset turnover, and customer experience—without risking profitability.
Target systemic improvements:
Since bottlenecks are not tied to a specific branch or warehouse, focus on end-to-end process reengineering (e.g., procurement automation, inventory management, supplier integration).
Leverage customer goodwill:
Loyal customers have tolerated delays so far; faster fulfillment could unlock new growth and retention without requiring price incentives.
Monitor after changes:
Track profitability, order volume, and customer repeat rates as improvements are made to ensure performance gains are sustained and risks are controlled.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
outputs		outputs
README.md		README.md

KeyanEnayati/Business-Process-Optimization

Folders and files

Latest commit

History

Repository files navigation

📊 Business Process Optimization Using Python & SQLite

📑 Table of Contents

📌 Project Overview

📁 Step 1. Dataset & Table Structure

🧹 Step 2. Data Cleaning in SQLite

🔽 2.1 Remove NULLs in Critical Fields

🔽 2.2 Remove Rows with Zero or Negative Price/Quantity

🔽 2.3 Keep Only IRR Currency

🔽 2.4 Keep Only Orders Between 2018 and 2020

🔽 2.5 Trim Whitespace in Key Text Fields

🔽 2.6 Remove Duplicate Rows

🔽 2.7 Create Cleaned Table for Analysis

⏱ Step 3. Delay & Lag Analysis

🔽 3.1 Load Cleaned Data in Python

🔽 3.2 Create Lag Columns

🔽 3.3 Basic Lag Statistics

📋 3.4 Summary & Key Takeaway

📊 3.5 Visualizing Fulfillment Lag Distributions

📝 Insight:

📦 3.6 Boxplots of Fulfillment Lags

📝 Insight:

💸 Step 4. Profitability & Margin Diagnostics

4.1 Order-Level Profitability

📝 Insight:

4.2 Do Longer Delays Lead to Lower Margins?

📝 Insight :

🏭 Step 5. Process Bottlenecks & Delay Causes

📝 Insight: Product-Level Bottlenecks

📈 Step 6. Advanced Operational Metrics

6.1 Repeat Order Rate

📝 Insight;

6.2 Order Volume Over Tim

📝 Insight:

6.3 Discount Rate and Lag Correlation

📝 Insight:

✅ Step 7. Conclusion & Process Recommendations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages