To predict median house prices in California using a linear regression model, based on features such as income, location, and housing characteristics.
We used the California Housing Dataset provided by Scikit-learn.
π Dataset Link
- Python
- Scikit-learn
- Pandas & NumPy
- Matplotlib / Seaborn (for visualization)
- Jupyter Notebook
-
RΒ² Score: 0.624
β€ The model explains 62.4% of the variance in housing prices. -
Key Insight:
Median Income
is the most influential feature, as shown by:- Its high coefficient in the model
- A clear positive trend in its scatter plot against house prices