A Machine Learning-based project to classify news articles as True or Fake using natural language processing and a combination of statistical and semantic features. The model leverages pre-trained Word2Vec embeddings to extract meaningful linguistic characteristics, and it employs supervised learning algorithms to deliver high-accuracy predictions.
- Arnab Biswas
- Maddi Pranav Reddy
- Mohan Nishantam
Classify a given news text as True or Fake using supervised machine learning algorithms. Utilize the Google News Word2Vec (300-d) pre-trained embeddings for feature extraction and semantic representation.
- Text cleaning: removal of punctuation, numbers, brackets, and stop words.
- Lemmatization: focusing primarily on nouns (
NN
,NNS
) as they hold key semantic value. - Conversion to lowercase for normalization.
- Sentence Vectors: Sum of all word vectors in a sentence.
- Frobenius Norm: Captures the volume of the textβs hyperspace.
- Word Similarity: Dot product between individual word vectors and the sentence vector.
- Text Length: Used as a proxy for editorial standards.
- Named Entity Recognition (NER): Counts of entity types (person, organization, location, etc.).
- Tried and tested models: Logistic Regression, Decision Tree, and Random Forest.
- Hyperparameter tuning using GridSearchCV.
- Evaluation metric: F1 Score (balances precision and recall, ideal for binary classification with roughly balanced classes).
- Text length distributions differ significantly between true and fake news.
- Word clouds and N-grams highlight focus areas:
- Fake news: more personal references, vague justifications, and visual proof.
- True news: more institutional terms, verifiable sources, and official tone.
- NER patterns: Fake news focuses disproportionately on persons; true news has a balanced distribution across types.
(solver: liblinear, regularization: l1, penalty: 0.1)
- Accuracy: 93.06%
- Precision: 92.47%
- Recall: 93.03%,
- F1 Score: 92.75%
βοΈ Outperformed Decision Trees and Random Forest on most metrics
All files are provided inside the ZIP archive:
Fake_News_Detection_ArnabBiswas_MaddiPranavReddy_MohanNishantam.zip
Fake_News_Detection.ipynb
β Jupyter notebook containing the full implementation.Fake_News_Detection_Report.pdf
β Final project report with analysis, EDA insights, and model evaluations.