This project involves analyzing a dataset of used cars to understand factors influencing their selling prices and preparing the data for predictive modeling. Visualizations and machine learning techniques are employed to derive insights and create a model to predict the selling prices of cars.
- Dataset Description
- Project Workflow
- Libraries Used
- Installation
- Exploratory Data Analysis (EDA)
- Data Preparation
- Modeling
- Visualization Examples
The dataset contains information about 301 used cars with the following features:
Car_Name
: Name of the car (removed during preprocessing).Year
: Year of manufacturing.Selling_Price
: Price the car was sold for (target variable).Present_Price
: Current market price of the car.Driven_kms
: Kilometers driven by the car.Fuel_Type
: Type of fuel used (Petrol
,Diesel
, orCNG
).Selling_type
: Selling method (Dealer
orIndividual
).Transmission
: Type of transmission (Manual
orAutomatic
).Owner
: Number of previous owners.
- Load and explore the dataset.
- Perform exploratory data analysis (EDA) to identify patterns and trends.
- Preprocess the data:
- Remove unnecessary columns.
- Encode categorical variables.
- Derive additional features.
- Visualize relationships and distributions.
- Split the data into training and testing sets.
- Train and test machine learning models.
pandas
: Data manipulation and preprocessing.numpy
: Mathematical operations.seaborn
: Data visualization.matplotlib
: Plotting graphs.scikit-learn
: Splitting data and machine learning.
- Clone this repository:
git clone https://github.com/your-repo/used-car-analysis.git
- Navigate to the project directory:
cd used-car-analysis
- Install the required dependencies:
pip install -r requirements.txt
Perform EDA to uncover patterns and relationships in the data. Key steps include:
- Inspecting the dataset structure and summary statistics.
- Visualizing distributions and correlations.
- Identifying missing or inconsistent data.
Preprocess the data for analysis and modeling:
- Remove the
Car_Name
column (irrelevant for prediction). - Convert the
Year
column into car age. - Encode categorical columns (
Fuel_Type
,Selling_type
,Transmission
) using one-hot encoding or label encoding. - Normalize or scale features if needed.
- Split the dataset into training and test sets.
- Train multiple regression models, such as:
- Linear Regression
- Random Forest Regressor
- Gradient Boosting Regressor
- Evaluate model performance using metrics like:
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- R² Score
- Price vs. Driven Kilometers: Understand how usage affects the price.
- Fuel Type Distribution: Analyze the popularity of different fuel types.
- Correlation Heatmap: Visualize relationships among numerical features.