This project aims to predict whether an individual earns more than $50K annually based on demographic and employment-related attributes from the UCI Adult dataset. We compare and evaluate multiple machine learning models to determine the most effective approach.
The dataset used is the Adult Income Dataset from the UCI Machine Learning Repository:
- 32,561 training instances and 16,281 test instances
- Features include age, workclass, education, marital status, occupation, race, sex, hours-per-week, and more
- Target variable:
>50K
or<=50K
- Categorical:
workclass
,education
,marital-status
,occupation
,relationship
,race
,sex
,native-country
- Numerical:
age
,fnlwgt
,education-num
,capital-gain
,capital-loss
,hours-per-week
Model | Description |
---|---|
Logistic Regression | Baseline linear model |
Random Forest | Ensemble model with decision trees |
Multi-layer Perceptron (MLP) | Neural network for classification |
Support Vector Machine (SVM) | Kernel-based classification |
- Handled missing values (
?
) by removal or imputation - Label encoding and one-hot encoding for categorical features
- Feature scaling for numerical values (where required)
- Train-test split (if not already provided)
- Accuracy
- Precision
- Recall
- F1-Score
- ROC-AUC
Model | Accuracy |
---|---|
Logistic Regression | 76.80% |
Random Forest | 93.90% |
MLP | 82.80% |
SVM | 89.80% |
- Clone the repository:
git clone https://github.com/SanyamBK/ML-Project-Census-Income-Prediction.git
cd ML-Project-Census-Income-Prediction
- Install dependencies:
pip install -r requirements.txt
- Run the Jupyter notebooks to train and evaluate models.
- Hyperparameter tuning
- Feature selection and dimensionality reduction
- Handling class imbalance
- Deployment using FastAPI or Streamlit
This project is for academic use and learning purposes.