This project aims to build a robust machine learning model to detect fraudulent transactions using the IEEE-CIS Fraud Detection dataset. By leveraging AutoGluon’s AutoML capabilities, we automate the entire process from data preprocessing to model selection and optimization, ensuring a high-performing classifier with minimal coding.
The dataset comprises both transaction and identity information, making it ideal for building a comprehensive fraud detection model. The key files used in this project are:
train_transaction.csv
— Transaction data with the target label (isFraud
).train_identity.csv
— Identity attributes corresponding to transactions.test_transaction.csv
— Transaction data for prediction (no labels).test_identity.csv
— Identity attributes for test transactions.sample_submission.csv
— A template for submitting the final predictions.
- Merge
train_transaction.csv
withtrain_identity.csv
to create the training dataset. - Similarly, merge
test_transaction.csv
withtest_identity.csv
for the test dataset.
- Use AutoGluon’s
TabularPredictor
to create a classification model. - Specify
isFraud
as the label and useroc_auc
as the evaluation metric, given the imbalanced nature of the dataset.
- Train the model using a subset of the data with AutoGluon’s
presets='good_quality'
to balance model quality and training speed.
- Generate probability scores for the test data and prepare the submission file (
my_submission.csv
).
- Save and download the submission file for evaluation on Kaggle.
Tutorial link: https://youtu.be/ilewdbDnjTU