This project implements a California Housing Price Prediction model using Linear Regression and Random Forest Regressor. The dataset used is the California Housing Dataset from sklearn.datasets
. The goal is to predict the median house prices based on various features such as income level, house age, and geographical location.
- Dataset Loading: The California Housing dataset is loaded using
fetch_california_housing(as_frame=True)
. - Feature Engineering:
β
Standardization of the dataset using
StandardScaler
. β Splitting the data into training (80%) and testing (20%) sets. β Checking dataset structure and statistics.
- Model Training: β Uses LinearRegression() to fit the training data.
- Predictions: β Predictions are made on the test set.
- Evaluation Metrics: β Mean Absolute Error (MAE): Measures the average absolute differences between predicted and actual prices. β Mean Squared Error (MSE): Penalizes large errors more significantly.
- Model Training: β Uses RandomForestRegressor(n_estimators=100, random_state=42).
- Predictions: β Predictions are made using the trained Random Forest model.
- Evaluation Metrics: β MSE, MAE, and RΒ² score are calculated to compare model performance.
Model | Mean Squared Error (MSE) | Mean Absolute Error (MAE) | RΒ² Score |
---|---|---|---|
Linear Regression | 0.55 | 0.53 | - |
Random Forest | 0.26 | 0.33 | 0.81 |
Conclusion: The Random Forest model outperforms Linear Regression with a lower MSE and MAE, and a high RΒ² Score (0.81), indicating it captures more variance in the dataset.
To run this project, install the required dependencies:
pip install pandas numpy scikit-learn matplotlib
1οΈβ£ Clone the repository:
git clone https://github.com/your-repo/California_Housing_Price_Prediction.git
cd California_Housing_Price_Prediction
2οΈβ£ Run the Python script in a Jupyter Notebook:
jupyter notebook
3οΈβ£ Execute the cells step by step to see the data processing, model training, and evaluation.
- Code Crafters Bm β Project development and implementation.
- Inspired by
sklearn.datasets
and regression modeling techniques.