📰 Fake News Detection System

A machine learning project that builds and evaluates models to classify news articles as fake or real based on their content. The dataset includes labeled text samples from actual and fake news sources.

📁 Project Structure

Fake.csv: Dataset containing fake news samples.
True.csv: Dataset containing real news samples.
Source: Kaggle - Fake and Real News Dataset
Notebook: Contains data preprocessing, model training, evaluation, and testing logic.

⚙️ Technologies Used

Python 3.x
Pandas
NumPy
Seaborn & Matplotlib
Scikit-learn

🚀 Features

Loads and merges real and fake news datasets.
Cleans and preprocesses news content.
Converts text to numerical features using TfidfVectorizer.
Splits data into training and testing sets (70/30 split).
Trains and evaluates two machine learning models:
- Logistic Regression
- Passive Aggressive Classifier
Prints accuracy, classification report, and confusion matrix for both models.
Includes manual testing on reserved samples.

🧪 Data Preprocessing Steps

Merged fake and real datasets, labeling:
- Fake: class = 0
- Real: class = 1
Dropped metadata columns: title, subject, date.
Cleaned text using:
- Lowercasing
- Removing punctuation and stopwords
- Removing hyperlinks, mentions, and extra spaces

def wordopt(text):
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub("\\W"," ",text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text

Used TfidfVectorizer to transform text into numerical feature vectors.

🤖 Models Used

1. Logistic Regression

Trained on TF-IDF features.
Accuracy and classification report printed.

2. Decision Tree Classifier

Alternative model evaluated on the same features.
Results compared for performance insights.

📊 Evaluation Metrics

Accuracy Score
Confusion Matrix
Classification Report (Precision, Recall, F1-score)

✅ Example Output

Logistic Regression Accuracy: ~98%
Decision Tree Classifier Accuracy: ~99%
Confusion matrix and report available in the notebook.

🧪 Manual Testing

10 fake and 10 real news entries reserved and tested against the models to evaluate generalization.

📌 How to Run

Clone the repository:

git clone https://github.com/PankajKumar-11/fake-news-detection.git
cd fake-news-detection

Install dependencies:

pip install -r requirements.txt

Run the notebook using Jupyter or JupyterLab.

📄 License

This project is open-source and available under the MIT License.

🙋‍♂️ Author

Pankaj Kumar
GitHub: @PankajKumar-11

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
Fake_News-Detection.ipynb		Fake_News-Detection.ipynb
README.md		README.md
fake_news_model.pkl		fake_news_model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📰 Fake News Detection System

📁 Project Structure

⚙️ Technologies Used

🚀 Features

🧪 Data Preprocessing Steps

🤖 Models Used

1. Logistic Regression

2. Decision Tree Classifier

📊 Evaluation Metrics

✅ Example Output

🧪 Manual Testing

📌 How to Run

📄 License

🙋‍♂️ Author

About

Uh oh!

Releases

Packages

Languages

PankajKumar-11/FakeNews-Detection

Folders and files

Latest commit

History

Repository files navigation

📰 Fake News Detection System

📁 Project Structure

⚙️ Technologies Used

🚀 Features

🧪 Data Preprocessing Steps

🤖 Models Used

1. Logistic Regression

2. Decision Tree Classifier

📊 Evaluation Metrics

✅ Example Output

🧪 Manual Testing

📌 How to Run

📄 License

🙋‍♂️ Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages