Skip to content

A multi-class classification project utilizing a Random Forest Classifier to accurately predict a student's stress level (Low, Medium, or High) based on 20 psychological and academic factors, achieving a high prediction accuracy of β‰ˆ87% on the test data. πŸŒ²πŸ“Š

Notifications You must be signed in to change notification settings

shlokshukla200/ML-Random_Forest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š Student Stress Predictor using Random Forest

🌟 Overview

This project implements a Random Forest Classifier to solve a multi-class classification problem: predicting a student's stress level (0: Low, 1: Medium, or 2: High) based on 20 features. These features span psychological, academic, and environmental factors crucial to student well-being.

The project demonstrates a complete machine learning workflow: meticulous data cleaning, robust outlier treatment, model training, and rigorous evaluation.


πŸ’Ύ Dataset Details

DataSet Link : https://www.kaggle.com/code/mdsultanulislamovi/student-stress-factors-dataset-analysis The model is trained on the StressLevelDataset.csv (1100 records). The target variable, stress_level, is well-balanced across its three classes, which is key for reliable model training.

Feature Type Key Features Examples of Range
Psychological/Health anxiety_level, depression, self_esteem, sleep_quality $0 - 30$
Academic/Career study_load, academic_performance, future_career_concerns $0 - 5$
Environmental/Social noise_level, social_support, bullying $0 - 5$

πŸ› οΈ Methodology & Technical Execution

The pipeline was executed in Python using Scikit-learn, demonstrating attention to data quality and model fairness:

  1. Data Quality Check: Confirmed no missing values (Non-Null Count = 1100) and uniform int64 data types, ensuring the dataset was immediately ready for numerical preprocessing.
  2. Outlier Treatment: Implemented the Interquartile Range (IQR) method to cap outliers in noise_level, living_conditions, and study_load. This preprocessing step was crucial to ensure the Random Forest model's robustness and prevent skewed splits.
  3. Model Training (Random Forest):
    • Used n_estimators=500 for high predictive stability.
    • Applied class_weight='balanced' to automatically adjust for any slight class imbalance, guaranteeing a fair and non-biased predictive model across all three stress levels.
  4. Split: $80/20$ train-test split (random_state=42) for reproducible results.

βœ… Model Performance & Key Results

The Random Forest Classifier achieved excellent performance on the test set:

Metric Score
Overall Accuracy $0.868$ ($\approx 86.8%$)

The balanced F1-scores of $\approx 0.86 - 0.88$ across all three classes (Low, Medium, High) confirm the model's reliability in handling this multi-class prediction task.

Detailed Classification Report

    precision    recall  f1-score   support

   0       0.87      0.86      0.86        76
   1       0.90      0.86      0.88        73
   2       0.84      0.89      0.86        71

accuracy 0.87 220

macro avg 0.87 0.87 0.87 220 weighted avg 0.87 0.87 0.87 220


πŸ’» Tech Stack

  • Python
  • Jupyter Notebook / Google Colab
  • Pandas & NumPy
  • Scikit-learn
  • Matplotlib & Seaborn

About

A multi-class classification project utilizing a Random Forest Classifier to accurately predict a student's stress level (Low, Medium, or High) based on 20 psychological and academic factors, achieving a high prediction accuracy of β‰ˆ87% on the test data. πŸŒ²πŸ“Š

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published