This repository contains a comprehensive data analysis and modelling project focused on improving customer service for a banking institution. The project includes:
Domain Analysis: Identifying key business problems related to customer risk assessment and retention. Database Design: Normalizing and structuring data from a provided CSV file into an SQLite database. Research Design: Implementing various data science techniques to analyze loan statuses, customer demographics, and retention strategies. Experimental Results: Analyzing and presenting findings to stakeholders.
Task 1: Domain Analysis
Brief description of the business problem, its significance, and potential solutions. Overview of the investigation areas and techniques used.
Task 2: Database Design
Conceptual design and normalization of the database schema. SQL scripts to create and populate tables in the SQLite database. Explanation of assumptions, keys, and relationships.
Task 3: Research Design
Detailed implementation of five modelling solutions: Chi-Square Test for Loan Status by City Logistic Regression for Loan Status Prediction Random Forest Classifier for Spending Habits and Loan Status ANOVA for Card Type and Transaction Amounts K-Means Clustering for Identifying Valuable Customers
Task 4: Experimental Results and Analysis
Presentation of findings and discussions on how results help with risk assessment and customer retention strategies. Evaluation of limitations and accuracy of the modelling techniques.
Python 3.x
SQLite
Required Python libraries:
pandas
numpy
scipy
scikit-learn
matplotlib
seaborn