This project is inspired by a Youtube tutorial, where I apply and adapt the skills I've learned to analyze data from a simulated business scenario with a dataset made up of several tables. The goal is to solve three major business problems by finding insights and proposing possible solution. The process involves using data analysis skills and tools, including SQL (via SQL Server), Python, and PowerBI to analyze, clean and visualize the data.
This project helps me understand the process, learn how to use programming languages in working application, and identify my mistakes to improve. It also shows me how data analysis can help solve real-world business challenges.
given in Data source folder ( as .bak
file ).
- Introduction to Business Problem
- Data Model
- Query Data Using SQL
- Sentiment Analysis Using Python
- Building an Interactive Dashboard With Power BI
- Data Analysis
- Credits
Suppose that there are Simulated Market named ShopEasy, an online retail business, is facing reduced customer engagement and conversion rates. They are reaching out to you to help conduct a detailed analysis and identify areas for improvement in their marketing strategies.
From reading above, I set that KPIs are
- Conversion Rate : Percentage of website visitors who make a purchase.
- Customer Engagement Rate : Level of interaction with marketing content (clicks, likes, comments)
- Customer Feedback Score : Average rating from customer reviews.
So I provided my goals are
- Increase Conversion Rates : Identify factors impacting the conversion rate and provide recommendations to improve it.
- Enhance Customer Engagement : Determine which types of content make the highest engagement.
- Improve Customer Feedback Scores : Understand common themes in customer reviews and provid actionable insights.
Defining an effective data structure in a dashboard is important. The images below show tables used in analysis.
Tables used in this model
- customers - Data related to each individual customers, their CustomerID, Age, etc.
- products - Data related to each individual products, their ProductID, price, etc.
- customer_journey - Data of every Jouney recorded by their CustomerID, ProductID, action such as View and Click etc.
- customer_reviews - Every review recorded and their reviewText.
- engagement_data - Data of every engagement relate to their ContentType.
- geography - GeographyID reference for mapping countries to their capital cities.
Open and connect to my local database on my computer using Microsoft SQL Server Management Studio, then restore the .bak
file and have database named PortfolioProject_MarketingAnalytics. Write an SQL statement for each table based on requirements. Below is a summary of the work I have completed.
- customers : Query to join dim_customers with dim_geography to enrich customer data with geographic information. The new table is
dim_customers
. - products : Query to categoriz products based on their price. The new table is
dim_products
. - customer_journey : Query to clean table by removing duplicates, handle missing values by replacing them with calculated averages then get a new table named
fact_customer_journey
. - customer_reviews : Query to clean whitespace issues in the reviewText column. The new table is
fact_customer_reviews
. - engagement_data : Query for Extracting and splitting combined metrics (Views and Clicks), formatting dates for uniformity, and standardizing values. The new table is
fact_engagement_data
.
You can see full codes in SQL Queries folder.
In the fact_customer_reviews table, there are various review comments. I applied Python libraries to classify and categorize the comments into distinct categories. The nltk library and SentimentIntensityAnalyzer are important in classification of Sentiment. Additionally, pandas and pyodbc also play crucial roles.
Steps
- Import Libaries :
Pandas
for data manipulation,pyodbc
for database connection, andnltk
for sentiment analysis. - Fetch Data from SQL Server : Connect to a Microsoft SQL Server database (PortfolioProject_MarketingAnalytics). Run a SQL query to fetch customer review data into a Pandas DataFrame.
- Calculate SentimentScore : Initialize
VADER Sentiment Analyzer
, then analyze the ReviewText using VADER and calculate a compound sentiment scores (ranging from -1 to 1). AddSentimentScore column
to the Dataframe. - Categorize Sentiments : Combine the sentiment score and the review's numerical Rating to classify each review into categories like Positive, Negative, Mixed Positive, etc. Add
SentimentCategory
column to the Dataframe. - Bucket Sentiment Scores : Group a Sentiment score into predefine text ranges such as 0.5 to 1.0 (Strongly Positive), 0.0 to 0.49 (Mildly Positive), etc. Add
SentimentBucket
column to the Dataframe. - Save the updated DataFrame to a new CSV file named
fact_customer_reviews_with_sentiment.csv
.
you can view full code in Python Code folder.
Start making an Interface by importing every modified table before to Power BI, Then Making an Calender table for connect evey table having Date column ( Dax code in Power BI Dashboard folder. ).
Set the Datatype for each columnn correctly to get the relationship of all tables as following :
Creating a dashboard to display all Key Performance Indicators(KPIs), and complete with filtering options for years, months, and product name.
Provides a comprehensive snapshot of key performance indicators (KPIs) such as conversion rate, social media engagement, and customer reviews.
Analyze the conversion by differnt action and product-specific perfomance.
Tracks social media important engagement result and the impact of content types.
Evaluate Customer feedback and rating to different products to find actionable insight.
.pbix
file and dashboard screen shot are in Dashboard folder.
Here is a brief analysis and summary following provided goals above. The full presentation slide are Data Presentation Full.
- Increase Conversion Rates
January recorded the highest overall conversion rate at 17.3%, driven significantly by the Ski Boots.
Suggesttion : Focus marketing efforts on products that show high conversion rates, such as Kayaks, Ski Boots, and Baseball Gloves. Implement seasonal promotions during peak months (e.g., January and September) to make these trend significantly more profitable.
- Enhance Customer Engagement
Blog content drove the most views, especially in April and July, while social media and video content maintained steady.
Suggestion : Restore Content Strategy. Experiment with more engaging content formats, such as blog content, particularly during historically lower-engagement months (September-December).
- Improve Customer Feedback Scores
The majority of customer reviews are in the higher ratings, with 431 reviews at 4 stars and 409 reviews at 5 stars, indicating overall positive feedback.
Suggestion : The existence of mixed positive and mixed negative sentiments suggests that there are opportunities to convert those mixed experiences into more clearly positive ones.
Most of contents in this project was created by Jirapat Chaiya, a Computer Engineering student at the Faculty of Engineering, Chulalongkorn University.
This project is inspired and guided by Ali Ahmad Special thanks to him for his valuable insights and tutorials.