This project performs natural language processing (NLP) tasks and classification on a dataset of news articles to distinguish between Fake News and Factual News. It involves preprocessing, tokenization, Named Entity Recognition (NER), sentiment analysis, topic modeling, and machine learning classification.
fake_news_data.csv
: Contains 198 news articles labeled as either "Fake News" or "Factual News".
title
: Headline of the news article.text
: Body of the news article.date
: Publication date.fake_or_factual
: Label ("Fake News" or "Factual News").
pandas
,matplotlib
,seaborn
spacy
,nltk
,re
vaderSentiment
gensim
sklearn
- Lowercasing, punctuation removal, stopword filtering
- Tokenization using
nltk
- Lemmatization using
WordNetLemmatizer
- Named Entity Recognition with
spaCy
- Sentiment scoring with
VADER
- Bag of Words and TF-IDF features
- Distribution of fake vs factual news
- Part-of-speech tagging frequency
- Common named entities in each category
- Sentiment analysis across news types
- Top unigrams after preprocessing
- LDA (Latent Dirichlet Allocation)
- LSA (Latent Semantic Analysis)
- Visualization of coherence scores for optimal topic number
Two models were trained using Bag of Words features:
- Accuracy: 90%
- Precision/Recall:
- Fake News: 93% / 86%
- Factual News: 88% / 94%
- Accuracy: 83%
- Precision/Recall:
- Fake News: 91% / 72%
- Factual News: 78% / 94%
- Count plots
- POS and NER distribution bars
- Sentiment bar charts
- LDA/LSA topic charts
- Clone the repository.
- Make sure
fake_news_data.csv
is in the root directory. - Install the dependencies:
pip install -r requirements.txt
- Run the analysis in a Jupyter Notebook or Python script.
Vishnu M
LinkedIn: linkedin.com/in/vishnu-m737