A simple and efficient Python script to clean the House Price Prediction dataset. It prepares your data by removing invalid and missing entries, making it ready for analysis or modeling.
-
π₯ Load data from Excel (
.xlsx
) -
π Initial data inspection and summary statistics
-
π« Remove rows with zero or negative values in critical columns:
LotArea
,YearBuilt
,YearRemodAdd
,TotalBsmtSF
,SalePrice
-
π§½ Drop rows with missing values
-
π’ Keep only numeric columns
-
πΎ Save cleaned data to a new Excel file
-
Place your dataset here:
c:\Ds\HousePricePrediction.xlsx
-
Run the cleaning script:
python data_cleaning.py
-
Find the cleaned dataset here:
Cleaned_HousePricePrediction.xlsx
- Cleaned Excel file with only valid records
- Console printout with the count of remaining records
- Python 3.x
- pandas
- openpyxl
Install dependencies:
pip install pandas openpyxl
- Adjust
zero_check_columns
in the script if your dataset has different column names. - Ideal for preprocessing before machine learning or statistical analysis.