Predicting Student's Math Scores

Overview

This repository contains code for building a machine learning model to predict students' math scores based on various features such as gender, race/ethnicity, parental level of education, lunch type, and test preparation course.

The aim is to develop an accurate model, focusing on writing production-level code and creating data pipelines from data acquisition to preprocessing and predicting.

Dataset

The dataset used for this project consists of the following columns:

Gender
Race/Ethnicity
Parental Level of Education
Lunch Type
Test Preparation Course
Math Score (target variable)
Reading Score
Writing Score

Machine Learning Models Utilized

Several machine learning algorithms were explored to develop the predictive model. The models used include:

Linear Regression
Ridge Regression
Lasso Regression
Support Vector Regression (SVR)
Decision Tree Regression
Random Forest Regression
K-Nearest Neighbors Regression
Gradient Boosting Regression
AdaBoost Regression
CatBoost Regression
XGBoost Regression

Model Evaluation and Selection

To identify the best performing models, techniques such as Randomized Search Cross-Validation (RandomizedSearchCV) were employed to tune hyperparameters and optimize model performance. Model evaluation metrics such as mean squared error (MSE), root mean squared error (RMSE), and R-squared values were utilized to assess predictive accuracy and generalization capabilities.

Final Model Selection

After thorough evaluation, the best-performing models were selected based on their predictive accuracy and performance metrics. These models were potentially combined or further fine-tuned to create the best final predictive model for estimating students' math scores based on the given features.

Summary

This project entails developing a comprehensive data processing and modeling pipeline utilizing Python, Flask, Docker, and AWS for deployment. Tasks include conducting thorough EDA, feature engineering, model training, website creation, Dockerization, CI/CD implementation, and AWS deployment setup. The objective is to deliver a robust, scalable solution for data analysis and predictive analytics.

Project Structure

Setup GitHub and Local Folder
- Create GitHub repo and .gitignore
- Create venv
- Create setup.py
- Create requirements.txt
Create Source Code Structure
- Create src directory and build the package (requirements.txt)
  - Create component files: data_ingestion.py, data_transformation.py, model_trainer.py
  - Create pipeline files: predict_pipeline.py, train_pipeline.py
  - Create exception, logger, and utils files: exceptions.py, logger.py, utils.py
Exploratory Data Analysis (EDA) in Jupyter Notebook
- Perform EDA
- Handle missing values
- Remove duplicate values
- Data cleaning
- Data imputation
- Feature engineering
- Train-test split
- Identify best performing models
- Model evaluation (R2)
Create Simple Webpage for User Input
Write Modular Code with respect to the Jupyter Notebook and Test on Local Server (Flask)

Docker Configuration and Deployment

Docker setup and configuration

sudo apt-get update -y
sudo apt-get upgrade

curl -fsSL https://get.docker.com -o get-docker.sh

sudo sh get-docker.sh

sudo usermod -aG docker ubuntu

newgrp docker

Build Docker image

Configure GitHub Workflow and CI/CD Action Runner

Setup AWS Resources for Deployment
- Create and configure IAM user
- Set up Amazon ECR repository
- Provision EC2 instance for deployment

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
artifacts		artifacts
catboost_info		catboost_info
src		src
static		static
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Student's Math Scores

Overview

Dataset

Machine Learning Models Utilized

Model Evaluation and Selection

Final Model Selection

Summary

Project Structure

Final Result

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Shlok-21/StudentsPerformance

Folders and files

Latest commit

History

Repository files navigation

Predicting Student's Math Scores

Overview

Dataset

Machine Learning Models Utilized

Model Evaluation and Selection

Final Model Selection

Summary

Project Structure

Final Result

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages