This project uses SQL to analyze a synthetic mobile money transaction dataset and explore fraud detection challenges.
- PaySim Dataset on Kaggle
- Note: For this project, only the first 10,000 rows of the dataset were used to simplify analysis and improve performance within Microsoft SQL Server.
File | Description |
---|---|
schema.sql |
Creates users , transactions , transaction_types tables |
seed.sql |
Populates tables from raw PaySim data |
queries.sql |
Exploratory queries and fraud detection logic |
users
: all unique user IDs from senders/receiverstransaction_types
: unique transaction types (e.g. TRANSFER, CASH_OUT)transactions
: main table linking users, types, and fraud indicators
- Out of 10,000 transactions, 68 were marked as real fraud (
isFraud = 1
) - The built-in fraud detection rule (
isFlaggedFraud
) failed to detect any of them (0% success) - Fraudsters often use TRANSFER and CASH_OUT types
- Dataset is synthetic and simplified (10k rows used from full 6M+)
- Balance columns can't be used to detect fraud
- Apply this logic to full dataset for deeper patterns
- Build custom fraud flagging logic in SQL
- Create dashboards using Power BI