This project analyzes and detects fraudulent transactions using a dataset of financial transactions. The workflow includes data exploration, visualization, feature engineering, and building a machine learning model to classify transactions as fraudulent or not. The trained model is deployed using Streamlit for interactive visualization and prediction.
- The dataset is loaded from an Excel file (
Fraud.xlsx
). - Key columns include transaction type, amount, balances, and fraud indicators.
-
Data Exploration & Cleaning
- Checked for missing values and data types.
- Explored class distribution for
isFraud
andisFlaggedFraud
. - Visualized transaction types and fraud rates.
-
Feature Engineering
- Created new features such as balance differences.
- Filtered and analyzed suspicious patterns (e.g., zero balances after transfer).
-
Visualization
- Plotted distributions of transaction amounts.
- Visualized fraud rates by transaction type and over time.
- Correlation heatmaps for key features.
-
Model Building
- Selected features and split data into training and test sets.
- Preprocessed data using scaling and one-hot encoding.
- Built a pipeline with logistic regression (class weight balanced).
- Evaluated model with classification report and confusion matrix.
-
Model Saving
- Saved the trained pipeline using
joblib
for future use.
- Saved the trained pipeline using
-
Deployment with Streamlit
- Developed a Streamlit app for interactive visualization and prediction.
- Users can upload transaction data and get real-time fraud predictions.
- Visualizations of transaction patterns and model results are available in the app.
- Accuracy: ~ 94%
- Python 3.x
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- joblib
- streamlit
Install dependencies with:
pip install pandas numpy matplotlib seaborn scikit-learn joblib streamlit
- Place
Fraud.xlsx
in the specified directory. - Run the notebook
Fraud_Detection.ipynb
step by step to train and save the model. - Start the Streamlit app:
streamlit run app.py
- Use the web interface to visualize data and make predictions.
- The model provides classification metrics and a confusion matrix for fraud detection.
- Visualizations help understand transaction patterns and fraud distribution.
- The Streamlit app allows for interactive exploration and prediction.
This project is for educational purposes.