A machine learning project that builds and evaluates models to classify news articles as fake or real based on their content. The dataset includes labeled text samples from actual and fake news sources.
- Fake.csv: Dataset containing fake news samples.
- True.csv: Dataset containing real news samples.
- Source: Kaggle - Fake and Real News Dataset
- Notebook: Contains data preprocessing, model training, evaluation, and testing logic.
- Python 3.x
- Pandas
- NumPy
- Seaborn & Matplotlib
- Scikit-learn
- Loads and merges real and fake news datasets.
- Cleans and preprocesses news content.
- Converts text to numerical features using
TfidfVectorizer. - Splits data into training and testing sets (70/30 split).
- Trains and evaluates two machine learning models:
- Logistic Regression
- Passive Aggressive Classifier
- Prints accuracy, classification report, and confusion matrix for both models.
- Includes manual testing on reserved samples.
- Merged fake and real datasets, labeling:
- Fake:
class = 0 - Real:
class = 1
- Fake:
- Dropped metadata columns:
title,subject,date. - Cleaned text using:
- Lowercasing
- Removing punctuation and stopwords
- Removing hyperlinks, mentions, and extra spaces
def wordopt(text):
text = text.lower()
text = re.sub('\[.*?\]', '', text)
text = re.sub("\\W"," ",text)
text = re.sub('https?://\S+|www\.\S+', '', text)
text = re.sub('<.*?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
return text- Used
TfidfVectorizerto transform text into numerical feature vectors.
- Trained on TF-IDF features.
- Accuracy and classification report printed.
- Alternative model evaluated on the same features.
- Results compared for performance insights.
-
Accuracy Score
-
Confusion Matrix
-
Classification Report (Precision, Recall, F1-score)
-
Logistic Regression Accuracy: ~98%
-
Decision Tree Classifier Accuracy: ~99%
-
Confusion matrix and report available in the notebook.
10 fake and 10 real news entries reserved and tested against the models to evaluate generalization.
- Clone the repository:
git clone https://github.com/PankajKumar-11/fake-news-detection.git
cd fake-news-detection- Install dependencies:
pip install -r requirements.txt- Run the notebook using Jupyter or JupyterLab.
This project is open-source and available under the MIT License.
Pankaj Kumar
GitHub: @PankajKumar-11


