This repository contains the code for a Fake News prediction system using Logistic Regression. The code is based on a Jupyter notebook originally generated by Colab.
This code performs the following tasks:
- Import Libraries: Imports necessary libraries like pandas, numpy, nltk etc. for data manipulation, text processing and machine learning.
- Data Preprocessing:
- Loads the training data (
train.csv
) into a pandas dataframe. - Handles missing values by replacing them with empty strings.
- Combines author name and title into a single "content" column.
- Separates the data (content) and the target label (fake/real).
- Applies stemming to reduce words to their root form and removes stopwords (common words like "the", "and").
- Converts textual data into numerical features using TF-IDF vectorizer.
- Loads the training data (
- Train-Test Split: Splits the data into training and testing sets for model evaluation.
- Model Training: Trains a Logistic Regression model on the training data.
- Evaluation:
- Evaluates the model's accuracy on both training and testing data.
- Prediction:
- Makes a prediction on a new unseen piece of text data (example from the testing set).
- Classifies the news as Real or Fake based on the prediction.
This code is intended to be run in a Jupyter Notebook environment. You can follow these steps:
- Download the code and data files.
- Open the
Fake_News_Prediction.ipynb
file in a Jupyter Notebook environment. - Run the code cells sequentially.
- Python 3.x
- pandas
- numpy
- nltk
- scikit-learn
l