Predicting Used Car Prices with Machine Learning
Project Title: Used Car Price Prediction using Regression Models Individual Project – MS in Business Analytics, UMass Amherst
Overview: This project focuses on building a predictive model to estimate the resale value of used cars based on features such as mileage, engine capacity, brand, fuel type, and more. The goal was to apply regression-based machine learning techniques to create a model that can deliver accurate and interpretable price predictions.
Objectives: • Clean and preprocess real-world used car data • Perform exploratory data analysis (EDA) to uncover trends and correlations • Build and compare multiple regression models including Linear Regression, Ridge, Lasso, and Random Forest • Evaluate model performance using RMSE and R² metrics • Optimize performance through hyperparameter tuning and feature engineering
Models & Methods Used: Data Preprocessing: Outlier removal, encoding categorical variables, missing value treatment Feature Engineering: Log transforms, scaling, interaction features
Regression Models: Linear Regression Ridge & Lasso Regression Random Forest Regressor
Evaluation Metrics: Root Mean Squared Error (RMSE) R² Score (Coefficient of Determination)
Tech Stack: Python: Core language Libraries: Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn Jupyter Notebook: Development environment
Results: • Achieved ~95% model accuracy on test data using Random Forest Regressor • Improved prediction stability through feature tuning and cross-validation • Visualized residuals and feature importances for better interpretability
Files Included: Predicting_Used_Car_Prices_Project.ipynb: Full notebook with EDA, model training, evaluation, and conclusions
Sample Use Cases: Car resale platforms estimating trade-in value, Dealerships optimizing pricing strategy, Consumers comparing vehicle prices before purchase
Author: Anand Gupta M.S. in Business Analytics – University of Massachusetts Amherst