This project analyzes a historical dataset from a manufacturing process using SQL and Python to implement Statistical Process Control (SPC). The goal is to monitor product quality and operator performance by identifying parts that fall outside acceptable height tolerances.
The project is based on the final assignment from the DataCamp SQL for Business Analysts skill track and has been extended with deeper analysis and visualizations for portfolio demonstration.
- Detect anomalies in product height measurements using dynamic control limits.
- Identify operators who produce a higher-than-average number of out-of-control parts.
- Visualize the control process and operator performance through meaningful charts.
- Window functions in SQL to calculate rolling averages and standard deviations.
- SPC limits using control limit formulas:
UCL = avg + 3 * stddev / sqrt(n)LCL = avg - 3 * stddev / sqrt(n)
- Nested queries & CTEs to structure multi-step logic.
- Boolean alert flags to detect quality issues.
- Alert rate analysis to compare operator performance.
- Data visualization with Seaborn and Matplotlib.
Table Name: manufacturing_parts
| Column | Definition | Data type |
|---|---|---|
item_no |
Item number | BIGINT |
length |
Length of the item made | DOUBLE PRECISION |
width |
Width of the item made | DOUBLE PRECISION |
height |
Height of the item made | DOUBLE PRECISION |
operator |
Operating machine | TEXT |
Data Source: This project uses fictional manufacturing data provided in DataCamp's Data Sources repository for DataLab projects.
- SQL query calculates the upper control limit (UCL) and lower control limit (LCL) based on the average height and standard deviation of the last 5 height measurement samples for each operator.
- If the height is outside these limits, it sets an alert flag to TRUE; otherwise, it sets it to FALSE.
- The query only considers rows where there are at least 5 preceding rows (row_number >= 5).
-- Flag whether the height of a part is within the control limits
SELECT
b.*,
CASE
WHEN b.height NOT BETWEEN b.lcl AND b.ucl THEN TRUE
ELSE FALSE
END AS alert
FROM
...2. SQL query for identifying operators who produce a higher-than-average number of out-of-control parts.
- SQL query is designed to identify and analyze height alerts for manufactured parts based on statistical control limits (UCL, LCL).
- It consists of several Common Table Expressions (CTEs) and a final SELECT statement.
WITH alerts AS (
SELECT
b.*,
CASE WHEN b.height NOT BETWEEN b.lcl AND b.ucl
THEN TRUE
ELSE FALSE
END AS alert
FROM
...A moving control chart was plotted for each operator, showing a part height measurements over time with rolling UCL/LCL/mean bands and color-coded alerts.
- 📈 Control Chart (aka Shewhart Chart) components:
- X-axis: row_number
- Y-axis: height
- Lines:
UCL(Upper Control Limit),LCL(Lower Control Limit),Mean(avg_height) - Points: Color-coded by
alert(red for True, blue for False) - Option: Separate chart per operator using subplots or filters
A dataset generated by the SQL query is loaded into the df_alerts data frame.
To complement the dynamic version, a traditional SPC control chart was added using fixed global limits for one operator to contrast methods and assumptions.
- To visualize process control stability over time, used fixed control limits calculated from the entire data sample.
- Computed overall mean height for operator 'Op-1', overall stddev for height, UCL, and LCL.
- Used fixed control limits for plotting; rolling values for calculation and flagging.
👉 See visualization 1 alternative setup
A bar chart displayed the number of alerts per operator, highlighting those above the average in a distinct color.
- 📊 Bar Chart components:
- X-axis: operator
- Y-axis: alert_count
- Colors: assigned based on above_average flag
A dataset generated by the SQL query is loaded into the df_operator_alerts data frame.
This visual shows normalized alert rates per operator with a reference line for the average rate - making outliers easy to spot.
- 📊 Lollipop Chart alert_rate by operator
- Highlights operators who generates out-of-spec alerts on a higher than average rate
- Draws a reference line at avg_alert_rate
A dataset generated by the SQL query is loaded into the df_operator_alerts data frame.
- A few operators consistently exceeded control limits at significantly higher rates than average.
- Dynamic control limits revealed subtle shifts in process quality that fixed limits might miss.
- Visualizing both control methods helped contextualize outlier behavior and select appropriate monitoring strategies.
- Automate alert monitoring via SQL dashboard or BI tool
- Analyze operator performance trends over time
- Combine with maintenance logs for root cause analysis
- PostgreSQL / SQL (window functions, CTEs)
- Python (Pandas, Seaborn, Matplotlib)
- Jupyter Notebook
- GitHub
Feel free to reach out if you’d like to discuss the project or see more examples of SQL analysis.