Skip to content

a Python script to clean house price data by removing invalid and missing values, preparing it for analysis.

Notifications You must be signed in to change notification settings

FaNa-AI/preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

🏠 House Price Data Cleaning Script 🧹

A simple and efficient Python script to clean the House Price Prediction dataset. It prepares your data by removing invalid and missing entries, making it ready for analysis or modeling.


✨ Features

  • πŸ“₯ Load data from Excel (.xlsx)

  • πŸ” Initial data inspection and summary statistics

  • 🚫 Remove rows with zero or negative values in critical columns:

    • LotArea, YearBuilt, YearRemodAdd, TotalBsmtSF, SalePrice
  • 🧽 Drop rows with missing values

  • πŸ”’ Keep only numeric columns

  • πŸ’Ύ Save cleaned data to a new Excel file


πŸš€ Usage

  1. Place your dataset here:

    c:\Ds\HousePricePrediction.xlsx
    
  2. Run the cleaning script:

    python data_cleaning.py
  3. Find the cleaned dataset here:

    Cleaned_HousePricePrediction.xlsx
    

πŸ“Š Output

  • Cleaned Excel file with only valid records
  • Console printout with the count of remaining records

βš™οΈ Requirements

  • Python 3.x
  • pandas
  • openpyxl

Install dependencies:

pip install pandas openpyxl

πŸ“ Notes

  • Adjust zero_check_columns in the script if your dataset has different column names.
  • Ideal for preprocessing before machine learning or statistical analysis.

About

a Python script to clean house price data by removing invalid and missing values, preparing it for analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published