This project implements predictive models to classify adult income based on the U.S. Census Bureau data. The goal is to predict whether a person earns more than $50K per year using machine learning techniques.
File | Description |
---|---|
FinalProject.ipynb |
Jupyter Notebook with complete code and outputs |
README.md |
Project overview and usage instructions |
adult-dataset.csv |
The data is in the file "adult-dataset.csv". It was extracted from the census bureau database, found at: http://www.census.gov/ftp/pub/DES/www/welcome.html |
The dataset contains demographic and employment-related attributes for U.S. adults. The target variable is Income
:
<=50K
or>50K
- Age (int)
- Work Class (categorical)
- Education (categorical)
- Marital Status (categorical)
- Occupation (categorical)
- Race (categorical)
- Sex (binary)
- Hours per week (int)
- Handled missing data by removing or imputing invalid entries
- Converted categorical variables using one-hot encoding
- Removed irrelevant columns
- Ensured numeric data for model compatibility
Applied both SVD and PCA to reduce dataset dimensions.
- Explained Variance: PCA components capturing >90% of varianc
- Built with
MLPClassifier
- Evaluated using Confusion Matrix & Classification Report
- Applied using Scikit-learn
- Good baseline model
- Fast and interpretable model
- Unsupervised clustering excluding
Income
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
MLP | ✅ High | ✅ High | ✅ High | ✅ High |
Logistic Regression | Moderate | Moderate | Moderate | Moderate |
Naïve Bayes | Moderate | Lower | Moderate | Moderate |
K-Means | - | - | - | - (unsupervised) |
pip install numpy pandas matplotlib seaborn scikit-learn