Skip to content

Mai3Prabhu/Multiple-Linear-Regression-with-Assumptions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿก Multiple Linear Regression โ€“ California Housing Dataset

This project demonstrates Multiple Linear Regression using the popular California Housing dataset from sklearn.datasets. It explores feature relationships, evaluates model performance using multiple metrics, and finally prepares the model for deployment using Pickling.


๐Ÿ“Œ Project Overview

Multiple Linear Regression helps us model the relationship between one dependent variable (target) and multiple independent variables (features). In this project, we aim to predict housing prices based on various features from the California Housing dataset.


๐Ÿ—ƒ๏ธ Dataset Details

  • ๐Ÿ“ฆ Source: sklearn.datasets.fetch_california_housing
  • ๐Ÿงฎ Samples: 20,000+
  • ๐Ÿ”ข Features: 8 numerical features
  • ๐ŸŽฏ Target: Price (Median House Value)

๐Ÿง  Workflow

  1. Load Dataset & Create DataFrame

    • Loaded using fetch_california_housing()
    • Converted to a pandas DataFrame
  2. Exploratory Data Analysis

    • Used seaborn.pairplot() to visualize relationships
    • Created a heatmap to observe feature correlations
  3. Data Preparation

    • Split into train/test sets using train_test_split
    • Standardized features using StandardScaler
  4. Model Building

    • Trained a Multiple Linear Regression model using LinearRegression from scikit-learn
  5. Model Evaluation

    • Evaluated with:
      • Mean Squared Error (MSE)
      • Mean Absolute Error (MAE)
      • Root Mean Squared Error (RMSE)
      • Rยฒ Score
      • Adjusted Rยฒ Score
  6. Assumptions & Residual Analysis

    • Plotted residuals using:
      • seaborn.distplot() to check normality
      • Scatter plot of residuals vs predictions to check homoscedasticity
    • Found that model accuracy could be improved; performance wasn't optimal
  7. Model Deployment Prep

    • Exported the trained model using Pickling (pickle.dump)
    • Discussed its usage in cloud-based inference pipelines

๐Ÿ“Š Libraries Used

Library Purpose
pandas Data handling
numpy Numerical computation
seaborn Visualization (pairplot, heatmap)
matplotlib Plotting
sklearn Dataset loading, ML models, metrics
pickle Model serialization

๐Ÿ“ˆ Metrics Used

  • ๐Ÿ“‰ MSE โ€“ Mean Squared Error
  • ๐Ÿ“‰ MAE โ€“ Mean Absolute Error
  • ๐Ÿ“‰ RMSE โ€“ Root Mean Squared Error
  • ๐Ÿ“ˆ Rยฒ Score โ€“ Goodness of fit
  • ๐Ÿ“ˆ Adjusted Rยฒ โ€“ Rยฒ adjusted for number of features

๐Ÿ—ƒ๏ธ Project Structure

File Name Description
Multiple_Linear_Regression.ipynb Full model implementation and evaluation
README.md Project documentation (this file)
model.pkl Serialized (pickled) trained model

๐Ÿš€ How to Run the Project

  1. Clone the Repository

    git clone https://github.com/YourUsername/Multiple-Linear-Regression-California.git
    cd Multiple-Linear-Regression-California
    
  2. Install required libraries

    pip install pandas numpy matplotlib seaborn scikit-learn
    
  3. Launch Jupyter Notebook

    jupyter notebook
    
  4. Open ipynb files and run through the cells.

โ˜๏ธ Model Deployment Tip

To deploy this model on the cloud:

  • Load the model.pkl file in your API/backend

  • Use libraries like Flask, FastAPI, or cloud services like AWS Lambda / Azure Functions

  • Standardize incoming input data exactly as done before training

  • Perform prediction using:

    import pickle
    model = pickle.load(open("model.pkl", "rb"))
    prediction = model.predict(new_scaled_data)
    

๐Ÿ‘ฉโ€๐Ÿ’ป Author

Maitri Prabhu

GitHub: Mai3Prabhu

About

Multiple Linear Regression involves having multiple independent features and one dependent feature.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published