Fake News Detection Project

Overview

This project focuses on addressing the critical issue of fake news detection using advanced data science and machine learning techniques. The solution leverages undersampling methods and fine-tuned LSTM models to achieve high accuracy in identifying fake news.

Key Features

Undersampling Methods:
- Applied NearMiss undersampling technique on the Kaggle Fake and Real News Dataset.
- Improved model accuracy by 10%, significantly enhancing the detection of misinformation.
Fake News Classifier:
- Developed a robust fake news classifier using LSTM (Long Short-Term Memory) models.
- Utilized the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer for feature extraction.
- Achieved an impressive 98% accuracy in detecting fake news.

Dataset

Source: Kaggle Fake and Real News Dataset
Content: 44,898 news articles labeled as "FAKE" or "REAL".
Columns: Title, Text (body text), Subject, and Date (publish date).

Approach

Data Preprocessing

Removed stopwords, punctuation, and special characters.
Employed TF-IDF vectorizer to convert textual data into numerical features.

Addressing Data Imbalance

Technique: Applied NearMiss undersampling to balance the dataset.
Effectiveness: Evaluated the impact of undersampling by comparing model performance on balanced and imbalanced datasets.

Model Selection and Training

Basic Models: Naive Bayes, Logistic Regression, Support Vector Machine (SVM).
Advanced Models: Decision Trees, Random Forest, LSTM networks.
Hyperparameter Tuning: Used grid search and random search methods to find optimal hyperparameters.

Evaluation Metrics

Accuracy, Precision, Recall, and F1 Score.
Emphasized Recall to ensure fake news is correctly identified.

Results

Accuracy: Achieved a 98% accuracy with the fine-tuned LSTM model.
Improvement: Observed a 10% increase in accuracy with NearMiss undersampling.

Conclusion

This project demonstrates the effectiveness of using LSTM models and undersampling techniques in detecting fake news. By improving model accuracy and addressing data imbalance, the solution provides a robust tool for combating misinformation.

Future Work

Explore Different Architectures: Investigate the impact of other machine learning and deep learning models.
Multimodal Approach: Incorporate image and video analysis alongside text analysis.
Transfer Learning: Leverage pre-trained models to improve performance and reduce training times.
Explainability: Integrate techniques like LIME (Local Interpretable Model-Agnostic Explanations) to understand model decision-making processes.
Real-time Application: Develop real-time detection systems for social media platforms and news websites.

Contact

For questions or collaborations, contact reshmi14@uw.edu

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Approach 1 - Balanced Dataset.ipynb		Approach 1 - Balanced Dataset.ipynb
Approach 2 - Imbalanced Dataset.ipynb		Approach 2 - Imbalanced Dataset.ipynb
Fake News Detection-LSTM.pdf		Fake News Detection-LSTM.pdf
FakeNewsDetection.ipynb		FakeNewsDetection.ipynb
LSTM- unbalanced.ipynb		LSTM- unbalanced.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fake News Detection Project

Overview

Key Features

Dataset

Approach

Data Preprocessing

Addressing Data Imbalance

Model Selection and Training

Evaluation Metrics

Results

Conclusion

Future Work

Contact

About

Uh oh!

Releases

Packages

Languages

ReshmiMehta14/Fake-news-detection

Folders and files

Latest commit

History

Repository files navigation

Fake News Detection Project

Overview

Key Features

Dataset

Approach

Data Preprocessing

Addressing Data Imbalance

Model Selection and Training

Evaluation Metrics

Results

Conclusion

Future Work

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages