This project aims to detect fraudulent credit card transactions using machine learning techniques. Given the highly imbalanced nature of fraud detection, we apply various resampling techniques and model evaluation metrics to ensure robust predictions.
The dataset used for this project is the Credit Card Fraud Detection dataset available on Kaggle. It consists of:
- 284,807 transactions
- 492 fraudulent transactions (0.172% of total data)
- Features: 30 columns (V1-V28 are PCA-transformed),
Time,Amount, andClass(target variable: 0 = Legitimate, 1 = Fraudulent)
- Handling class imbalance using SMOTE (Synthetic Minority Over-sampling Technique) and undersampling.
- Scaling numerical features using
StandardScalerorMinMaxScaler. - Feature engineering to extract meaningful transaction patterns.
- Distribution of transaction amounts and time.
- Fraudulent vs. legitimate transaction patterns.
- Correlation analysis of PCA-transformed features.
We experimented with various machine learning models: ✅ Random Forest ✅ AdaBoost ✅ XGBoost ✅ Lightbgm ✅ Neural Networks (Deep Learning Approach)
Since fraud detection is an imbalanced classification problem, we focus on:
- Precision, Recall, F1-score (to minimize false negatives)
- AUC-ROC Curve (to evaluate the model’s discriminatory power)
- Confusion Matrix (to analyze misclassification rates)
This project utilizes AWS SageMaker for deploying the trained fraud detection model. The deployment steps include:
- Using built-in SageMaker algorithms or custom scripts.
- Training the model with SageMaker’s managed Jupyter notebooks.
- Deploying the trained model as a real-time endpoint.
- Using SageMaker Inference for making predictions on new transactions.
- The best model achieved:
- 88.1% roc auc score
- 85.2% F1-score
- Random Forest and XGBoost were the most balanced models in terms of performance.
- Neural Networks performed exceptionally well in high-computational environments with AUC close to 0.999.
- Implementing deep learning architectures for better accuracy.
- Deploying the model using Flask or FastAPI for API-based fraud detection.
- Enhancing real-time fraud detection using streaming data (e.g., Apache Kafka).
Contributions are welcome! Feel free to submit pull requests or raise issues.
This project is licensed under the MIT License.
- Open-source Machine Learning & Data Science Communities