Skip to content

PankajKumar-11/FakeNews-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

📰 Fake News Detection System

A machine learning project that builds and evaluates models to classify news articles as fake or real based on their content. The dataset includes labeled text samples from actual and fake news sources.

'FakeNewsBanner'

📁 Project Structure

  • Fake.csv: Dataset containing fake news samples.
  • True.csv: Dataset containing real news samples.
  • Source: Kaggle - Fake and Real News Dataset
  • Notebook: Contains data preprocessing, model training, evaluation, and testing logic.

⚙️ Technologies Used

  • Python 3.x
  • Pandas
  • NumPy
  • Seaborn & Matplotlib
  • Scikit-learn

🚀 Features

  • Loads and merges real and fake news datasets.
  • Cleans and preprocesses news content.
  • Converts text to numerical features using TfidfVectorizer.
  • Splits data into training and testing sets (70/30 split).
  • Trains and evaluates two machine learning models:
    • Logistic Regression
    • Passive Aggressive Classifier
  • Prints accuracy, classification report, and confusion matrix for both models.
  • Includes manual testing on reserved samples.

🧪 Data Preprocessing Steps

  1. Merged fake and real datasets, labeling:
    • Fake: class = 0
    • Real: class = 1
  2. Dropped metadata columns: title, subject, date.
  3. Cleaned text using:
    • Lowercasing
    • Removing punctuation and stopwords
    • Removing hyperlinks, mentions, and extra spaces
def wordopt(text):
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub("\\W"," ",text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text
  1. Used TfidfVectorizer to transform text into numerical feature vectors.

🤖 Models Used

1. Logistic Regression

  • Trained on TF-IDF features.
  • Accuracy and classification report printed.

2. Decision Tree Classifier

  • Alternative model evaluated on the same features.
  • Results compared for performance insights.

📊 Evaluation Metrics

  • Accuracy Score

  • Confusion Matrix

  • Classification Report (Precision, Recall, F1-score)


✅ Example Output

  • Logistic Regression Accuracy: ~98%

  • Decision Tree Classifier Accuracy: ~99%

  • Confusion matrix and report available in the notebook.

    'RealNewsOutput' 'FakeNewsOutput'


🧪 Manual Testing

10 fake and 10 real news entries reserved and tested against the models to evaluate generalization.


📌 How to Run

  1. Clone the repository:
git clone https://github.com/PankajKumar-11/fake-news-detection.git
cd fake-news-detection
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the notebook using Jupyter or JupyterLab.

📄 License

This project is open-source and available under the MIT License.


🙋‍♂️ Author

Pankaj Kumar
GitHub: @PankajKumar-11

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published