Skip to content

4KSHAT0p/naive-bayes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Fake News Detection Using Naive Bayes

This repository contains an implementation of the following variants of Naive Bayes classifier for detecting fake news.

  • Naive Bayes raw (without TF-IDF)
  • Naive Bayes with TF-IDF (Term Frequency-Inverse Document Frequency)

Several preprocessing techniques such as tokenization, stopword removal and methods such as Laplace smoothing and assigning TF-IDF weights to probabilities, has been done to maximize the model accuracy.

Note: This implementation is completely done from scratch and uses libraries for text preprocessing and evaluation purposes only.

Dataset

The dataset used in this project is taken from here.

The columns used for training are:

  • text: The content of the news article.
  • label: The target label (fake or real).

You can modify the script to work with your dataset by ensuring the column names match the expected structure.

Prerequisites

  • Python
  • pandas
  • scikit-learn
  • numpy
  • matplotlib
  • seaborn

Results

Both models have been evaluated according to the specified dataset only and the following results were achieved:

Picture1

Screenshot 2024-09-30 215430

Picture2

Screenshot 2024-09-30 215528

  • Accuracy (Raw Naive Bayes): 96%
  • Accuracy (TF-IDF Naive Bayes): 97%

Feel free to experiment with the models and improve their performance!

Suggested Improvements

  • Word Stemming
  • I have assigned the TF-IDF score (named as 'epsilon' in the code) as 1e-9 when the term is not in the TF-IDF table which is comparatively larger number w.r.t range of probabilities in this dataset. So for more fair predictions, we can use 2e-308 as 'epsilon'. This boils down the accuracy from 97% to 96% as on evaluation with the latter changes.

About

Naive Bayes from scratch achieving up to 96 % accuracy

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages