Skip to content

LouisBrammer/AI-Fair-Inter

Repository files navigation

Student Dropout Prediction: Fairness‐Aware Machine Learning Pipeline

Authors: Anna‐Maria Fiederling & Louis Brammer Institution: Católica Lisbon School of Business and Economics (M.Sc. Business Analytics) Course: AI Fairness and Interpretability (May 2025) Overview

Student dropout poses a significant challenge in higher education, with individual and societal consequences that extend far beyond graduation rates. In this project, we develop a comprehensive, fairness‐aware machine learning pipeline to predict dropout risk under an assistive‐intervention paradigm. Our goal is twofold:

Maximize Recall of At‐Risk Students

Ensure Equitable Treatment Across Demographic Groups

We combine statistical testing, multiple model baselines, bias‐mitigation techniques (pre‐processing, in‐processing, post‐processing), and explainability (SHAP) to produce an early‐warning tool that is both accurate and fair.

Data

We use the publicly available “Predict Students Dropout and Academic Success” dataset from the UCI Machine Learning Repository, containing 4,424 entries and 36 features, including:

Demographics & Background: Gender, Age at enrollment, Nationality, Scholarship holder

Academic Records: Application mode, admission grade, 1st/2nd semester approved units, grades, evaluations

Parental & Socioeconomic Indicators: Mother’s/Father’s qualification & occupation, regional unemployment/inflation/GDP

Other Factors: Educational special needs, displaced status, debtor status, tuition fees up to date

Target: Dropout = 1, Enrolled/Graduated = 0 (binary) for fairness‐aware modeling

Methods

1. Exploratory Data Analysis (EDA)

Chi‐Square Tests of Independence
• Hypothesis: Male students are approximately twice as likely to drop out as females.
• Results: Gender (χ² = 183.16, p < 0.001), Scholarship (χ² = 265.10, p < 0.001), parental education & occupation all significant (p < 0.001); Nationality & special needs non‐significant.

Cramér’s V Effect Sizes
• Scholarship status: V ≈ 0.24 (strongest association)
• Gender: V ≈ 0.20 (moderate)
• Mother’s Occupation: V ≈ 0.20; Father’s Occupation: V ≈ 0.17
• Indicates which predictors carry the most signal (chi‐square & V code in 01_EDA.ipynb).

*2. Baseline Models

Logistic Regression
• Focused on maximizing recall (to minimize false negatives, i.e., missed dropouts).
• Metrics: Accuracy = 0.88, Dropout Recall = 0.75, Precision = 0.88, F₁ = 0.81, AUC = 0.911.

Random Forest
• Improved AUC = 0.926 but lower dropout recall = 0.70 (more false negatives).

XGBoost
• Highest absolute dropout catch: 214/284 → Recall = 0.75 (ties LR), Precision = 0.84, F₁ = 0.79, Accuracy ≈ 0.87.
• Selected as best “assistive” baseline prior to fairness interventions.

Keras Neural Network
• Accuracy = 0.88, Dropout Recall = 0.71, Precision = 0.88, F₁ = 0.79
• Did not outperform XGBoost or LR on recall; increased complexity without clear gain.

3. Fairness Audit & Pre‐Processing

IBM AI Fairness 360 Toolkit (Bellamy et al., 2019)
• Demographic Parity (DP): Difference in “non‐dropout” prediction rates (male vs female).
• Equal Opportunity (EO): Difference in true‐positive rates (i.e., recall) across gender.

Pre‐Processing Mitigation

    Reweighing (Kamiran & Calders, 2012)
    • Adjusts instance weights so that gender groups have equal statistical weight.
    • Post‐reweighing metrics: ΔDP = 0.178, ΔEO = 0.022, Accuracy = 0.88, Recall = 0.76.

    Disparate Impact Remover (DI Remover) (Feldman et al., 2015)
    • Transforms feature distributions to approximate parity.

4. In‐Processing & Post‐Processing Mitigation

In‐Processing (Exponentiated Gradient) (Hardt, Price & Srebro, 2016)
• Trains a fairness‐constrained classifier to minimize loss subject to DP or EO constraints.
• Results: ΔDP ≈ 0.02, ΔEO ≈ 0.35 (unsatisfactory EO), Accuracy = 0.826.

Post‐Processing (Threshold Optimizer) (Pleiss et al., 2017)
• Uses a trained model and adjusts decision thresholds per group to satisfy EO.
• Combined with DI Remover: Pre‐DP = +0.120, Pre‐EO = +0.072, Post‐DP = +0.095, Post‐EO = +0.079, Accuracy = 0.866.
• When optimized, yields dropout recall = 0.835, precision = 0.714 at threshold = 0.20, ΔDP = 0.125, ΔEO = 0.012.

++5. SHAP Explainability++

SHAP (SHapley Additive exPlanations) (Lundberg & Lee, 2017)
• Global Importance (Beeswarm & Bar Charts) (Fig. 15–19)
– Top features by mean |SHAP|:
1. Approved 2nd‐semester credits (mean |SHAP| ≈ 0.67)
2. Tuition fees up to date (≈ 0.20)
3. Approved 1st‐semester credits (≈ 0.15)
4. 2nd‐semester grade (≈ 0.10)
5. Course of study (≈ 0.10)
– Shows demographic |SHAP| (Gender_1 ≈ 0.01, Scholarship_holder_1 ≈ 0.01) are marginal.
• Individual Force & Waterfall Plots (Fig. 24–25)
– Example A (high‐risk): Final risk = 0.879; key drivers: no 2nd‐semester passes (+0.38), high‐risk course (+0.10), poor 1st‐semester (3 passes, +0.08).
– Example B (low‐risk): Final risk = 0.020; protective factors: low‐risk course (–0.28), high admission grade (–0.12).
• Decision Paths & Cluster Analysis (Fig. 20–21, 26)
– Three risk cohorts: high (SHAP > +1.0), medium (≈ [–0.2, +0.2]), and secondary high.
– Suggests tiered intervention: urgent outreach, routine monitoring, targeted follow‐up.
• Interactions (Fig. 22–23)
– Age: Older students derive larger negative shifts per approved credit.
– Grade: Higher grades amplify credit’s protective effect.

Key Takeaways

Statistical Rigor: Chi‐square tests and Cramér’s V confirmed which sensitive features truly matter (σ² tests of association).

Model Diversity: We compared linear (LR), ensemble (RF), gradient boosting (XGB), and neural architectures—identifying the optimal trade‐off between recall and overall performance.

Fairness Engineering: Hands‐on implementation of pre‐processing (Reweighing, Disparate Impact Remover), in‐processing (Exponentiated Gradient), and post‐processing (Equalized Odds) methods, illustrating real‐world trade‐offs between Demographic Parity and Equal Opportunity (Hardt et al., 2016; Pleiss et al., 2017).

Explainability Focus: Extensive SHAP‐based analysis shows how features—especially academic progress—drive predictions and how to interpret individual risk scores for targeted interventions (Lundberg & Lee, 2017).

By integrating accuracy, fairness, and interpretability, this project demonstrates an end‐to‐end pipeline for developing trustworthy, equitable predictive models in high‐stakes educational settings.