This project is a Streamlit application for end-to-end time series forecasting and analysis. It allows users to:
- Upload a CSV dataset containing a time series.
- Visualize the original time series with rolling statistics (mean and standard deviation).
- Decompose the series into Observed, Trend, Seasonal, and Residual components (Additive/Multiplicative).
- Forecast future values using ARIMA, ETS (Exponential Smoothing), or Prophet models.
- Evaluate & Compare models using performance metrics (MSE, RMSE, MAE, MAPE).
- Export trained models in
.sav
format for later use.
- Interactive Widgets: Pythonic code using Streamlit widgets (
file_uploader
,selectbox
,number_input
,radio
,button
). - Plotly Integration: Interactive, zoomable graphs for all visualizations and forecasts.
- Session State: Persistent performance metrics across multiple model runs.
- Model Export: Trained models serialized via
joblib
and downloadable in.sav
format. - Conditional Formatting: Performance table highlights best (green), worst (red), and intermediate (yellow) models by RMSE.
- Python 3.7 or newer
- Streamlit
- Pandas
- NumPy
- Plotly
- statsmodels
- prophet (optional, for Prophet model)
- joblib
Ensure you have a compatible NVIDIA CUDA toolkit installed if using GPU-accelerated libraries.
-
Clone this repository:
git clone https://github.com/your-username/your-repo-name.git cd your-repo-name/src
-
(Optional) Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
From the src
directory, run:
streamlit run app.py
This will launch a local server (usually at http://localhost:8501
). Open the URL in your browser.
- Click Browse files to upload a CSV with at least two columns: one datetime column and one numeric target column.
- Rolling Statistics: Choose a rolling window size to display the time series with its rolling mean and standard deviation.
- Select Additive or Multiplicative decomposition.
- Enter the seasonal period (e.g., 7 for weekly seasonality).
- View the four-panel decomposition plot.
- Choose a model from ARIMA, ETS, or Prophet.
- Enter the forecast horizon (number of future periods).
- Click Train and Forecast to see:
- Interactive forecast plot.
- Performance metrics table update.
- Download link for the trained model (
.sav
).
- The bottom table lists all trained models with metrics:
- MSE, RMSE, MAE, MAPE.
- Best RMSE row is dark green (#337142).
- Worst RMSE row is dark red (#811414).
- Others are amber (#8b7400).
- Click Export Model to download the serialized model.
├── README.md
├── requirements.txt
└── src
└── app.py
- app.py: Main Streamlit application script.
- requirements.txt: List of Python packages required.
- README.md: This documentation file.
To deploy on Streamlit Cloud:
- Push your code to GitHub.
- In Streamlit Cloud, click New app and connect your GitHub repo.
- Set the
main.py
(orsrc/app.py
) entrypoint and branch. - Provide
requirements.txt
in the repo root. - Deploy — Streamlit will install dependencies and launch your app.
Contributions, issues, and feature requests are welcome! Feel free to:
- Fork the repository.
- Create a new branch for your feature/bug fix.
- Submit a pull request with a clear description.
This project is licensed under the MIT License. See the LICENSE file for details.
This project focuses on detecting fraudulent credit card transactions using machine learning techniques, versioned data pipelines, and MLOps best practices.
Fraudulent transactions represent a major financial risk for banks, customers, and businesses. This project builds a robust end-to-end system to:
- Analyze transaction patterns
- Handle highly imbalanced data
- Build and optimize various classification models
- Track experiments, version datasets, and automate pipelines
📦 project/
┣ 📂 data/
┃ ┣ 📂 raw/ # Original dataset (Kaggle)
┃ ┣ 📂 processed/ # Cleaned and feature-engineered datasets
┣ 📂 notebooks/ # EDA, visualization, modeling experiments
┣ 📂 src/ # Python scripts: cleaning, feature engineering, modeling
┣ 📂 models/ # Saved models
┣ 📄 requirements.txt # Python dependencies
┣ 📄 params.yaml # Centralized parameters
┣ 📄 dvc.yaml # DVC pipeline definition
┣ 📄 MLproject # MLflow project definition
┣ 📄 README.md # Project documentation (this file)
- EDA and Data Cleaning: Thorough exploration and cleaning of ~1.3M transaction records.
- Data Versioning: Using DVC to track versions of raw, cleaned, and model-ready datasets.
- Handling Imbalanced Data: Techniques like RandomUnderSampler and SMOTETomek to balance classes.
- Feature Engineering: Time-based aggregations, categorical encoding, and lagged feature generation.
- Modeling: Trained multiple classifiers (Logistic Regression, KNN, SVC, Decision Tree) with hyperparameter tuning.
- Experiment Tracking: Logged experiments with MLflow, tracking accuracy and ROC-AUC.
- Visualization: Interactive Plotly plots for fraud distribution, transaction trends, and model performance.
- Fraud distribution: Extremely imbalanced dataset (~0.5% fraud).
- Amount Analysis: Fraudulent transactions often have smaller average amounts.
- Category Analysis: Certain merchant categories have higher fraud rates.
- Time Series Decomposition: Decomposed transaction volumes into seasonal, trend, and residual components.
- Outlier Detection: Applied IQR and Z-score methods for amount outlier analysis.
Step | Description |
---|---|
1. Data Cleaning | Drop missing values, fix data types, engineer features |
2. Data Versioning | Track dataset versions with DVC |
3. Handling Imbalance | Apply undersampling and SMOTETomek |
4. Model Training | Train classifiers: LR, KNN, SVC, DT |
5. Hyperparameter Tuning | Use GridSearchCV for optimization |
6. Model Evaluation | Generate classification reports and ROC curves |
7. Experiment Tracking | Log runs in MLflow |
Model | Accuracy | ROC-AUC |
---|---|---|
Logistic Regression | XX% | XX |
K-Nearest Neighbors | XX% | XX |
Support Vector Classifier | XX% | XX |
Decision Tree | XX% | XX |
(Actual metrics will vary based on your experiments.)
Tool | Purpose |
---|---|
DVC | Data versioning and pipeline management |
MLflow | Experiment tracking and model registry |
Git | Code version control |
Scikit-learn | Modeling and preprocessing |
Plotly | Interactive visualizations |
Pandas/Numpy | Data manipulation |
imbalanced-learn | Class balancing techniques |
# Clone repository
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
# Install dependencies
pip install -r requirements.txt
- Add deep learning models (TCN, LSTM)
- Build ensemble meta-model
- Create Streamlit UI for real-time prediction
- Deploy with Docker on AWS/GCP
- Integrate CI/CD for full MLOps pipeline
- Credit Card Transactions Dataset - Kaggle
- DVC Documentation
- MLflow Documentation
- Imbalanced-learn Documentation
Feel free to connect: