This project focuses on detecting fraudulent credit card transactions using machine learning techniques. The dataset is highly imbalanced and requires special handling to effectively identify fraudulent behavior while minimizing false positives.
-
Details:
- Contains 284,807 transactions, including 492 fraud cases.
- Features: 30 total โ
Time
,Amount
, anonymized variablesV1
toV28
, andClass
(0 = Normal, 1 = Fraud). - Highly imbalanced (~0.17% fraud rate).
- Explore and preprocess the dataset
- Train and evaluate multiple classification models
- Compare performance metrics to identify the most effective approach
- ย k-nearest Neighbors (KNN)
- Logistic Regression
- Support Vector Machines (SVM)
- Naive Bayes.
Given the dataset's imbalance, the following metrics were prioritized:
- Precision
- Recall
- F1-Score
- Python 3.x
- pandas, numpy, matplotlib, seaborn
- scikit-learn
This was one of my early machine learning projects, built to explore fraud detection using imbalanced datasets and basic classification models. While the implementation is quite simple by my current standards, it helped me understand key ML concepts like feature importance, precision-recall tradeoff, and data preprocessing.