📊 SMS Fraud Detection and Reporting using LLM

🧠 Project Overview

This project implements an end-to-end pipeline for detecting SMS spam using LLM-based embeddings (Mistral), interpretable machine learning, and risk-aware reporting.

It includes:

Exploratory Data Analysis (EDA)
Embedding generation using ollama Mistral model
Random Forest classifier with performance evaluation
LIME explanations for interpretability
Executive-level HTML/PDF reporting using LLM-generated narrative

🗂️ Repository Structure

├── data/
│   ├── spam.csv                 # Raw dataset (Kaggle UCI SMS Spam)
│   ├── model_metrics.csv        # Saved model evaluation metrics
├── plots/
│   ├── *.png                    # Visuals from EDA, LIME, Confusion Matrix
├── reports/
│   ├── fraud_detection_report.html
│   ├── fraud_detection_report.pdf
├── notebooks/
│   ├── 01_EDA.ipynb
│   ├── 02_llm_finetuning_prediction.ipynb
│   ├── 03_llm_executive_report.ipynb
├── README.md

📥 Installation

Clone the repository:

git clone https://github.com/sumitdeole/LLM_text_data.git
cd LLM_text_data

Create environment and install dependencies:

conda create -n sms-fraud python=3.10 -y
conda activate sms-fraud
pip install -r requirements.txt

Install additional system dependencies:
- WeasyPrint requires GTK3 runtime for Windows
- Add C:\Program Files\GTK3-Runtime Win64\bin to your system PATH

⚙️ Notebooks

1. EDA + Feature Engineering

Loads and visualizes data
Generates word clouds and top spam unigrams

2. Model Training with Mistral

Generates 4096-D embeddings using ollama's Mistral
Trains a balanced Random Forest
Saves metrics and plots

3. Executive Reporting

Feeds metrics and text features to LLM for narrative
Renders HTML/PDF executive report with visuals

📈 Example Outputs

Confusion Matrix
LIME Explanation

📄 License

MIT License

⭐ Star this repo

If you find this project helpful, feel free to give it a ⭐ on GitHub!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 SMS Fraud Detection and Reporting using LLM

🧠 Project Overview

🗂️ Repository Structure

📥 Installation

⚙️ Notebooks

1. EDA + Feature Engineering

2. Model Training with Mistral

3. Executive Reporting

📈 Example Outputs

📄 License

⭐ Star this repo

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
notebooks		notebooks
plots		plots
reports		reports
README.md		README.md
requirements.txt		requirements.txt

sumitdeole/LLM_text_data

Folders and files

Latest commit

History

Repository files navigation

📊 SMS Fraud Detection and Reporting using LLM

🧠 Project Overview

🗂️ Repository Structure

📥 Installation

⚙️ Notebooks

1. EDA + Feature Engineering

2. Model Training with Mistral

3. Executive Reporting

📈 Example Outputs

📄 License

⭐ Star this repo

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages