This project focuses on applying advanced SQL queries and machine learning techniques to analyze transactional data for customer segmentation. The goal is to uncover customer behavior patterns and create distinct customer groups that enable targeted marketing strategies and personalized engagement campaigns.
- Analyze customer transactional data to identify distinct behavior patterns
- Create meaningful customer segments using SQL and machine learning techniques
- Develop actionable insights for targeted marketing campaigns
- Provide data-driven recommendations for personalized customer engagement
- Enable marketing teams to optimize campaign effectiveness through customer segmentation
- SQL - Advanced queries for data extraction, transformation, and analysis
- Python (Pandas) - Data manipulation and preprocessing
- Python (Scikit-learn) - Machine learning algorithms for clustering
- Tableau - Data visualization and dashboard creation
- MySQL/PostgreSQL - Database management and query execution
- RFM Analysis - Recency, Frequency, and Monetary value segmentation
- Behavioral Segmentation - Purchase patterns and product preferences
- Demographic Analysis - Customer profile characteristics
- Lifetime Value Calculation - Customer value assessment
- Churn Risk Assessment - Identifying at-risk customers
- Purchase Pattern Analysis - Seasonal and temporal trends
- K-Means clustering for customer grouping
- SQL-based statistical analysis
- Advanced customer scoring algorithms
- Cohort analysis for retention insights
customer-segmentation-sql/
βββ README.md                    # Main project documentation
βββ requirements.txt             # Python dependencies  
βββ .gitignore                  # Git ignore file
βββ sql/
β   βββ customer_analysis.sql   # Customer behavior analysis queries
β   βββ rfm_analysis.sql        # RFM segmentation queries
β   βββ segmentation_queries.sql # Advanced segmentation logic
βββ python/
β   βββ data_preprocessing.py   # Data cleaning and preparation
β   βββ customer_segmentation.py # Machine learning segmentation
β   βββ visualization.py       # Data visualization scripts
βββ assets/
β   βββ segmentation_overview.png # Project overview diagram
β   βββ rfm_analysis.png        # RFM analysis visualization
β   βββ customer_segments.png   # Customer segments dashboard
βββ data/sample/
β   βββ sample_transactions.csv # Sample transaction data
β   βββ customer_segments.csv   # Segmentation results
β   βββ segment_profiles.csv    # Segment characteristics
βββ tableau/
    βββ customer_dashboard.twbx  # Tableau dashboard file
- Python 3.8 or higher
- SQL Database (MySQL, PostgreSQL, or SQL Server)
- Tableau Desktop (for visualization)
- Required Python libraries (see requirements.txt)
- Clone the repository
git clone https://github.com/SAHIL-HANSA/customer-segmentation-sql.git
cd customer-segmentation-sql- Install Python dependencies
pip install -r requirements.txt- Set up database connection
# Update connection strings in config files
# Configure database credentials# Execute customer analysis queries
# Run sql/customer_analysis.sql in your database
# Execute RFM analysis queries
# Run sql/rfm_analysis.sql for segmentation# Run customer segmentation algorithm
python python/customer_segmentation.py
# Generate visualizations
python python/visualization.py- Open tableau/customer_dashboard.twbxin Tableau Desktop
- Connect to your processed data
- Refresh data connections
- Explore interactive customer segments
- Improved Campaign Targeting: Enabled 40% improvement in campaign response rates
- Customer Retention: Identified at-risk customers leading to 25% reduction in churn
- Revenue Optimization: High-value customer identification increased revenue per customer by 35%
- Personalized Marketing: Tailored messaging improved customer engagement by 60%
- VIP Customers (8%): High value, frequent buyers - Premium service focus
- Loyal Customers (22%): Consistent purchasers - Loyalty program optimization
- Potential Loyalists (15%): Recent customers with growth potential
- At-Risk Customers (18%): Declining engagement - Retention campaigns
- Hibernating Customers (20%): Inactive but valuable - Win-back campaigns
- Lost Customers (17%): Churned customers - Analysis for prevention
- Processed 500,000+ transaction records
- Achieved 92% clustering accuracy using K-means
- Reduced query execution time by 60% through optimization
- Created automated segmentation pipeline
- Raw Transaction Data β Customer database tables
- SQL Analysis β Advanced queries for pattern identification
- Data Processing β Python-based cleaning and feature engineering
- Machine Learning β K-means clustering and validation
- Visualization β Tableau dashboards and Python plots
- Business Insights β Actionable recommendations
Transaction Data β SQL Aggregation β Feature Engineering β ML Clustering β Segment Analysis β Business Recommendations
| Segment | Characteristics | Marketing Strategy | 
|---|---|---|
| VIP Customers | High RFM scores, frequent purchases | Premium services, exclusive offers | 
| Loyal Customers | Consistent buying patterns | Loyalty rewards, cross-selling | 
| Potential Loyalists | Recent activity, growth potential | Engagement campaigns, incentives | 
| At-Risk | Declining frequency/recency | Retention offers, personalized outreach | 
| Hibernating | Previously active, now dormant | Win-back campaigns, special promotions | 
| Lost Customers | No recent activity | Exit surveys, competitive analysis | 
- Window Functions - ROW_NUMBER(), RANK(), LAG() for customer journey analysis
- CTEs (Common Table Expressions) - Complex multi-step analysis
- Subqueries and Joins - Cross-referencing customer and transaction data
- Aggregate Functions - Statistical analysis and customer metrics
- Date Functions - Time-based segmentation and cohort analysis
-- RFM Analysis Query Example
WITH customer_rfm AS (
  SELECT 
    customer_id,
    DATEDIFF(CURDATE(), MAX(transaction_date)) as recency,
    COUNT(transaction_id) as frequency,
    SUM(transaction_amount) as monetary
  FROM transactions
  WHERE transaction_date >= DATE_SUB(CURDATE(), INTERVAL 2 YEAR)
  GROUP BY customer_id
),
rfm_scores AS (
  SELECT *,
    NTILE(5) OVER (ORDER BY recency DESC) as r_score,
    NTILE(5) OVER (ORDER BY frequency) as f_score,
    NTILE(5) OVER (ORDER BY monetary) as m_score
  FROM customer_rfm
)
SELECT *,
  CONCAT(r_score, f_score, m_score) as rfm_segment
FROM rfm_scores;- Fork the repository
- Create a feature branch (git checkout -b feature/improvement)
- Commit changes (git commit -am 'Add new segmentation technique')
- Push to branch (git push origin feature/improvement)
- Create a Pull Request
- Author: Sahil Hansa
- Email: sahilhansa007@gmail.com
- LinkedIn: Sahil Hansa
- GitHub: SAHIL-HANSA
- Location: Jammu, J&K, India
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to the marketing team for providing business requirements
- Special recognition to data engineering team for clean datasets
- Machine learning community for segmentation best practices
- Tableau community for visualization inspiration
β If you found this project helpful, please consider giving it a star! β



