Financial Fraud Detection Using AI

Get Deeper Understanding from motivation to data collection to model training to final results you can checkout the project blogs here: https://www.amitkedia.com/project/67ce1a818013ee818192b171

Project Overview

Introduction: This project utilizes machine learning, deep learning, and Large Language Models (LLMs) to detect financial fraud. It's based on a comprehensive dataset derived from financial filings to the U.S. Securities and Exchange Commission (SEC), aiming to compare and enhance AI models in identifying fraudulent financial activities.

Objective: The goal is to foster a collaborative platform where data scientists and researchers can develop, test, and improve AI models for detecting financial fraud.

Dataset Description

Source: The dataset includes financial filings from 170 companies, split equally between those involved in fraudulent and non-fraudulent activities.

Structure: Each dataset entry contains details such as Central Index Key (CIK), filing year, company name, and a categorical indicator of fraud.

Final Dataset: Finally the dataset is out on Kaggle do check it out here..

Data Preprocessing

Preprocessing steps involve text cleaning, tokenization, and transforming data into machine-readable formats, ensuring balanced and fair model training.

Model Implementation

The project encompasses a variety of models, including Logistic Regression, SVM, Random Forest, XGBoost, ANN, HAN, GPT-2, and FinBERT, selected for their NLP capabilities and potential in fraud detection.

To Reproduce

Codebase: Complete code for data extraction, preprocessing, model training, and evaluation is available in this repository.

Environment: A requirements.txt file is provided for setting up a consistent environment.

Documentation: Each script is documented with clear instructions in the README.md, guiding through environment setup, script execution, and result interpretation.

Contribution Guidelines

Getting Started:

Fork the repository.
Setup your environment with requirements.txt.
Familiarize yourself with the code and dataset.

Contributing:

Add or improve models, or refine preprocessing methods.
Ensure your code is documented and aligns with the project's style.
Submit pull requests with a detailed description of changes.

Reporting Issues:

Use GitHub Issues for bug reports, feature requests, or discussions.
Provide detailed bug descriptions and reproduction steps.

Community:

Engage in discussions, share results, ask questions.
Adhere to community guidelines for a collaborative environment.

License

This project is open-source, available under MIT License.

Acknowledgements

Thanks to all contributors and community members for their valuable participation and insights in advancing AI in financial fraud detection.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.gitignore		.gitignore
ANN.py		ANN.py
Data-Extraction-SEC.py		Data-Extraction-SEC.py
Data-Preprocesing.py		Data-Preprocesing.py
Data_cleaning_preprocessing.ipynb		Data_cleaning_preprocessing.ipynb
Detailed_Report_on_financial_fraud_detection.pdf		Detailed_Report_on_financial_fraud_detection.pdf
FinBERT-Pretrain_new.py		FinBERT-Pretrain_new.py
FinBERT-Pretrained.py		FinBERT-Pretrained.py
GPT-2.py		GPT-2.py
Get_period_end_date_for_each_ticker.ipynb		Get_period_end_date_for_each_ticker.ipynb
HAN.py		HAN.py
Logistic_Regression.py		Logistic_Regression.py
README.md		README.md
RandomForest.py		RandomForest.py
SVM.py		SVM.py
XGBoost.py		XGBoost.py
combining_rows.py		combining_rows.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Financial Fraud Detection Using AI

Project Overview

Dataset Description

Data Preprocessing

Model Implementation

To Reproduce

Contribution Guidelines

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

amitkedia007/Financial-Fraud-Detection-Using-LLMs

Folders and files

Latest commit

History

Repository files navigation

Financial Fraud Detection Using AI

Project Overview

Dataset Description

Data Preprocessing

Model Implementation

To Reproduce

Contribution Guidelines

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages