This project provides a command-line tool that helps to detect whether a news article is real or fake. It leverages a machine learning model trained on a large dataset to analyze text and predict its authenticity with a high degree of accuracy.
The primary goal of this project is to prevent the spread of misleading news by classifying articles as either Fake or Real. It addresses this by using Natural Language Processing (NLP) and a classic machine learning model to differentiate between real and false content.
The model was trained on a substantial dataset consisting of 44,898 news articles. The content of this dataset is primarily from 2017, providing a robust, albeit time-specific, foundation for the model's classifications. The project successfully demonstrates the viability of this approach, achieving an accuracy of over 98.5% on its test data.
- High-Accuracy Predictions: Classifies articles as
REAL
orFAKE
with a proven accuracy of over 98.5%. - Prediction Confidence: Provides a confidence score (e.g.,
99.18%
) with each prediction, indicating the model's certainty. - Modular Structure: The project is cleanly divided into a training script (
train_model.py
) and a prediction application (app.py
). - Efficient Workflow: The training process only needs to be run once. It saves a reusable model and vectorizer, allowing the prediction app to load and run instantly.
Potential improvements and future directions for the project include:
- Real-Time News Analysis: Integrate with live news APIs (e.g., NewsAPI, Google's Fact check API key) to fetch and classify current events, expanding its relevance beyond the static 2017 dataset.
- Web Interface: Develop a user-friendly web application using a framework like Flask or Streamlit to make the tool accessible to a broader audience.
- Advanced Models: Explore more complex deep learning models, such as LSTMs or pre-trained transformers (e.g., BERT), to potentially improve nuance and accuracy.
The following libraries are required to run this project and are listed in requirements.txt
:
python 3.8 or higher version
pandas
scikit-learn
imbalanced-learn
joblib
Follow these steps to get the project running locally.
-
Clone this repository:
git clone https://github.com/deepikagandla7456/Fake-News-Detection.git cd Fake-News-Detection
-
Install the required packages: It is recommended to use a virtual environment.
pip install -r requirements.txt
-
Train the model: This is a crucial step that must be run first. This script will process the data and create the
saved_models
directory.python train_model.py
After the model has been trained and saved, the application can be used for predictions.
-
Run the application script:
python app.py
-
Enter News Text: The program will prompt you to paste the text of a news article.
-
Receive the Prediction: The model will return its classification (
REAL
orFAKE
) and its confidence level. Typequit
orexit
to close the application.
- Data Processing: The
train_model.py
script loads theTrue.csv
andFalse.csv
datasets. - Feature Engineering: The text from the articles is converted into numerical vectors using a TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer. This technique helps the model understand which words are most important in distinguishing between the two classes.
- Model Training: A Logistic Regression classifier is trained on the vectorized data.
- Application: The
app.py
script loads the pre-trained model and vectorizer to perform live predictions on new, user-provided text.
Model Training and Evaluation Output
Article Prediction Output
This project is licensed under the MIT - see the LICENSE file for details.