This project implements a Random Forest Classifier to solve a multi-class classification problem: predicting a student's stress level (0: Low, 1: Medium, or 2: High) based on 20 features. These features span psychological, academic, and environmental factors crucial to student well-being.
The project demonstrates a complete machine learning workflow: meticulous data cleaning, robust outlier treatment, model training, and rigorous evaluation.
DataSet Link : https://www.kaggle.com/code/mdsultanulislamovi/student-stress-factors-dataset-analysis
The model is trained on the StressLevelDataset.csv (1100 records). The target variable, stress_level, is well-balanced across its three classes, which is key for reliable model training.
| Feature Type | Key Features | Examples of Range |
|---|---|---|
| Psychological/Health |
anxiety_level, depression, self_esteem, sleep_quality
|
|
| Academic/Career |
study_load, academic_performance, future_career_concerns
|
|
| Environmental/Social |
noise_level, social_support, bullying
|
The pipeline was executed in Python using Scikit-learn, demonstrating attention to data quality and model fairness:
-
Data Quality Check: Confirmed no missing values (
Non-Null Count = 1100) and uniformint64data types, ensuring the dataset was immediately ready for numerical preprocessing. -
Outlier Treatment: Implemented the Interquartile Range (IQR) method to cap outliers in
noise_level,living_conditions, andstudy_load. This preprocessing step was crucial to ensure the Random Forest model's robustness and prevent skewed splits. -
Model Training (Random Forest):
- Used
n_estimators=500for high predictive stability. - Applied
class_weight='balanced'to automatically adjust for any slight class imbalance, guaranteeing a fair and non-biased predictive model across all three stress levels.
- Used
-
Split:
$80/20$ train-test split (random_state=42) for reproducible results.
The Random Forest Classifier achieved excellent performance on the test set:
| Metric | Score |
|---|---|
| Overall Accuracy |
The balanced F1-scores of
precision recall f1-score support
0 0.87 0.86 0.86 76
1 0.90 0.86 0.88 73
2 0.84 0.89 0.86 71
accuracy 0.87 220
macro avg 0.87 0.87 0.87 220 weighted avg 0.87 0.87 0.87 220
- Python
- Jupyter Notebook / Google Colab
- Pandas & NumPy
- Scikit-learn
- Matplotlib & Seaborn