Skip to content

SQL-based analysis of a manufacturing process using Statistical Process Control (SPC) to detect out-of-control parts and evaluate operator performance. Includes visualizations and alert rate diagnostics using Python.

License

Notifications You must be signed in to change notification settings

I-Mukhatov/sql-spc-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating a Manufacturing Process using SQL & Python

Visual Summary


Project Overview

This project analyzes a historical dataset from a manufacturing process using SQL and Python to implement Statistical Process Control (SPC). The goal is to monitor product quality and operator performance by identifying parts that fall outside acceptable height tolerances.

The project is based on the final assignment from the DataCamp SQL for Business Analysts skill track and has been extended with deeper analysis and visualizations for portfolio demonstration.


Project Objectives

  • Detect anomalies in product height measurements using dynamic control limits.
  • Identify operators who produce a higher-than-average number of out-of-control parts.
  • Visualize the control process and operator performance through meaningful charts.

Key Concepts & Techniques

  • Window functions in SQL to calculate rolling averages and standard deviations.
  • SPC limits using control limit formulas:
    • UCL = avg + 3 * stddev / sqrt(n)
    • LCL = avg - 3 * stddev / sqrt(n)
  • Nested queries & CTEs to structure multi-step logic.
  • Boolean alert flags to detect quality issues.
  • Alert rate analysis to compare operator performance.
  • Data visualization with Seaborn and Matplotlib.

Dataset Overview

Schema name: PUBLIC

Column Definition Data type
item_no Item number BIGINT
length Length of the item made DOUBLE PRECISION
width Width of the item made DOUBLE PRECISION
height Height of the item made DOUBLE PRECISION
operator Operating machine TEXT

Data Source: This project uses fictional manufacturing data provided in DataCamp's Data Sources repository for DataLab projects.


SQL Analysis

1. SQL query for detecting part measurements anomalies.

  • SQL query calculates the upper control limit (UCL) and lower control limit (LCL) based on the average height and standard deviation of the last 5 height measurement samples for each operator.
  • If the height is outside these limits, it sets an alert flag to TRUE; otherwise, it sets it to FALSE.
  • The query only considers rows where there are at least 5 preceding rows (row_number >= 5).
-- Flag whether the height of a part is within the control limits 
SELECT
	b.*,
	CASE 
		WHEN b.height NOT BETWEEN b.lcl AND b.ucl THEN TRUE
		ELSE FALSE 
	END AS alert
FROM
...

👉 See full query

2. SQL query for identifying operators who produce a higher-than-average number of out-of-control parts.

  • SQL query is designed to identify and analyze height alerts for manufactured parts based on statistical control limits (UCL, LCL).
  • It consists of several Common Table Expressions (CTEs) and a final SELECT statement.
WITH alerts AS (
	SELECT
	b.*,
	CASE WHEN b.height NOT BETWEEN b.lcl AND b.ucl
		 THEN TRUE
		 ELSE FALSE
		 END AS alert
	FROM
...

👉 See full query


Visualizations

Control Chart (Dynamic)

A moving control chart was plotted for each operator, showing a part height measurements over time with rolling UCL/LCL/mean bands and color-coded alerts.

  • 📈 Control Chart (aka Shewhart Chart) components:
  • X-axis: row_number
  • Y-axis: height
  • Lines: UCL (Upper Control Limit), LCL (Lower Control Limit), Mean (avg_height)
  • Points: Color-coded by alert (red for True, blue for False)
  • Option: Separate chart per operator using subplots or filters

A dataset generated by the SQL query is loaded into the df_alerts data frame.

👉 See visualization 1 setup

Control Chart (Fixed)

To complement the dynamic version, a traditional SPC control chart was added using fixed global limits for one operator to contrast methods and assumptions.

  • To visualize process control stability over time, used fixed control limits calculated from the entire data sample.
  • Computed overall mean height for operator 'Op-1', overall stddev for height, UCL, and LCL.
  • Used fixed control limits for plotting; rolling values for calculation and flagging.

👉 See visualization 1 alternative setup

Alerts by Operator

A bar chart displayed the number of alerts per operator, highlighting those above the average in a distinct color.

  • 📊 Bar Chart components:
  • X-axis: operator
  • Y-axis: alert_count
  • Colors: assigned based on above_average flag

A dataset generated by the SQL query is loaded into the df_operator_alerts data frame.

👉 See visualization 2 setup

Alert Rate Lollipop Chart

This visual shows normalized alert rates per operator with a reference line for the average rate - making outliers easy to spot.

  • 📊 Lollipop Chart alert_rate by operator
  • Highlights operators who generates out-of-spec alerts on a higher than average rate
  • Draws a reference line at avg_alert_rate

A dataset generated by the SQL query is loaded into the df_operator_alerts data frame.

👉 See visualization 3 setup


Key Findings

  • A few operators consistently exceeded control limits at significantly higher rates than average.
  • Dynamic control limits revealed subtle shifts in process quality that fixed limits might miss.
  • Visualizing both control methods helped contextualize outlier behavior and select appropriate monitoring strategies.

Next Steps (Optional Extensions)

  • Automate alert monitoring via SQL dashboard or BI tool
  • Analyze operator performance trends over time
  • Combine with maintenance logs for root cause analysis

Tools & Technologies

  • PostgreSQL / SQL (window functions, CTEs)
  • Python (Pandas, Seaborn, Matplotlib)
  • Jupyter Notebook
  • GitHub

📬 Contact

Feel free to reach out if you’d like to discuss the project or see more examples of SQL analysis.

About

SQL-based analysis of a manufacturing process using Statistical Process Control (SPC) to detect out-of-control parts and evaluate operator performance. Includes visualizations and alert rate diagnostics using Python.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published