This project focuses on predicting academic performance among secondary school students using machine learning. The goal is to:
- Predict students' Grade Point Average (GPA)
- Classify students as at-risk based on predicted GPA (below a threshold)
- Compare classification model performance
- Uncover key factors that influence student success
This project empowers educators with early warnings and data-driven insights by combining academic, behavioral, and demographic features.
The dataset contains a range of features such as:
- Demographics (e.g., gender, ethnicity, age)
- Academic indicators (e.g., study time, absences, parental education)
- Behavioral traits (e.g., extracurriculars, volunteering, tutoring)
📄 A full breakdown of features, data types, and example values can be found in the data catalog under the docs/
folder.
- GPA Prediction: Estimate students' GPA using regression techniques
- At-Risk Classification:
- GPA-based risk flagging (Flag students as "At Risk" when the predicted GPA is below 2.0)
- Direct classification using trained classifiers
- Model comparison view included in the Streamlit app
- Feature Importance: Analyze key factors driving academic performance
- Streamlit App: Provide a simple user interface for live predictions
- Insights for Educators: Offer actionable data to improve student outcomes
git `clone` https://github.com/mandele1999/student_performance_project.git
cd student_performance_project
pip install -r requirements.txt
cd scripts
streamlit run streamlit_app.py
student_performance_project/
│
├── data/
│ ├── raw/
│ ├── processed/
│ └── feature_engineered_student_data.csv
│
├── docs/
│ ├── data_catalog.md
│ ├── app_guide.md
| ├── model_report.md
│ └── project_overview.md
│
├── models/
│ ├── logreg_risk_classifier_model.pkl
│ ├── lr_student_grade_model.pkl
│ └── preprocessor_pipeline.pkl
│
├── notebooks/
│ ├── 01_data_preprocessing.ipynb
│ ├── 02_feature_engineering.ipynb
│ └── 03_modeling.ipynb
│
├── scripts/
│ ├── custom_transformers.py
│ ├── inference.py
│ └── streamlit_app.py
│
├── requirements.txt
├── README.md
├── .gitignore
└── LICENSE
Once the app is running:
- Enter student details (age, study time, absences, etc.)
- Predict GPA using a trained Linear Regression model
- See two risk classification results:
- Based on predicted GPA threshold
- From a trained classifier model
- Compare model outputs in a single view
- Get instant feedback on potential student performance
- Best Model: Linear Regression for GPA prediction
- Risk Classification: Compare both GPA-threshold-based and trained classifier models
- Top Influencers: Parental education, study time, absences, and support systems emerged as significant predictors
- App Feedback: Real-time, easy-to-understand results to assist decision-makers in schools
Contributions are welcome! If you'd like to add improvements, raise issues, or contribute new ideas (e.g., dashboard features, model tuning, or alternative algorithms):
- Fork the repository
- Create a new branch (
git checkout -b feature-xyz
) - Make your changes and commit (
git commit -am 'Add new feature'
) - Push the branch (
git push origin feature-xyz
) - Create a Pull Request
This project is licensed under the MIT License.
- Open-source Python tools and libraries
- Contributors and mentors from the data science community
- Student Performance Dataset on Kaggle