This Sales Prediction Model is a part of my CodeAlpha internship. It uses machine learning to predict future sales based on advertising spend across TV, Radio, and Newspaper channels. The model leverages advanced techniques to help businesses forecast their sales efficiently and accurately. π―
In this project, Iβve implemented a linear regression model to predict sales based on the amount spent on advertising across TV, Radio, and Newspaper channels. The model's performance is evaluated using key metrics such as Mean Squared Error, Root Mean Squared Error, and R-squared.
Predict sales based on advertising spending in three channels: TV, Radio, and Newspaper.
Visualize relationships between features and sales through regression plots and heatmaps.
Performance evaluation using R-squared and error metrics.
Trained on real-world sales data from advertising campaigns.
Python π
Pandas for data manipulation π
Scikit-Learn for machine learning π€
Matplotlib and Seaborn for visualizations π
Data Preprocessing:
Loaded and cleaned the dataset, removing unnecessary columns.
Used exploratory data analysis (EDA) to understand feature relationships and correlations.
Visualized feature relationships with regression plots and a heatmap to identify trends.π’
Model Selection:
Splitting the dataset into training and testing sets using an 80/20 ratio.
Trained a Linear Regression model on the training data to predict sales. π―
Model Evaluation:
Evaluated the model's performance using the R-squared score, Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). π¨
Exploratory Data Analysis (EDA): Learned how to visualize and analyze the relationships between features and target variables.
Linear Regression: Gained insights into how regression models can be used to predict continuous outcomes like sales.
Performance Metrics: Learned to assess model performance using metrics such as R-squared, MSE, and RMSE.
Data Visualization: Strengthened skills in data visualization to communicate insights effectively.
After evaluating the model on test data, here are the performance metrics:
Mean Squared Error: 3.17
Root Mean Squared Error: 1.78
R-squared: 0.90
The model performed well, with an R-squared score of 0.90, indicating that it can explain 90% of the variance in sales.
GitHub Repository:
A big thank you to CodeAlpha for the opportunity to grow and expand my skills in Data Science and Machine Learning! π
Contributions are welcome! If you have suggestions or want to contribute to improving this project
Go to the repository on GitHub and click on the 'Pull Request' tab. Submit your changes for review. Iβll review your pull request and merge it if everything looks good! π