This project implements a fake news detection system using machine learning algorithms, specifically Naive Bayes and Support Vector Machine (SVM) classifiers. The system processes text data and classifies news articles as either "FAKE" or "REAL".
The project is organized into three main Python files, each handling different aspects of the data lifecycle:
class DataPreprocessor:
def __init__(self, data_path):
self.data_path = data_path
# ...
Key functionalities:
- Data loading and cleaning
- Feature preparation
- Text processing pipeline setup
- Train-test split management
class NaiveBayesClassifier:
def __init__(self):
self.pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultinomialNB())
])
Features:
- Text vectorization using CountVectorizer
- TF-IDF transformation
- Model training and evaluation
- Single text prediction capability
class SVMClassifier:
def __init__(self, kernel='linear'):
self.pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', SVC(kernel=kernel))
])
Features:
- Linear kernel SVM implementation
- Text preprocessing pipeline
- Model evaluation metrics
- Prediction functionality
Both models are evaluated using:
- Accuracy Score
- Confusion Matrix
- Classification Report
- Training and Prediction Time
- Accuracy: ~95.58%
- Training Time: ~0.03 seconds
- Prediction Time: ~0.015 seconds
- Accuracy: ~93.05%
- Training Time: ~51.82 seconds
- Prediction Time: ~16.56 seconds
from index import DataPreprocessor
# Initialize preprocessor
preprocessor = DataPreprocessor('path_to_your_dataset.csv')
df = preprocessor.load_data()
X_train, X_test, y_train, y_test = preprocessor.prepare_features()
from naive_bayes_model import NaiveBayesClassifier
nb_classifier = NaiveBayesClassifier()
nb_classifier.train(X_train, y_train)
nb_classifier.evaluate(X_test, y_test)
from svm_model import SVMClassifier
svm_classifier = SVMClassifier()
svm_classifier.train(X_train, y_train)
svm_classifier.evaluate(X_test, y_test)
# For single text prediction
text = "Your news article text here"
prediction = classifier.predict_single(text)
- Modular and maintainable code structure
- Comprehensive evaluation metrics
- Easy-to-use interface
- Support for both batch and single-text predictions
- Performance timing measurements
- pandas
- scikit-learn
- numpy
- time
- Add support for more classification algorithms
- Implement cross-validation
- Add feature importance analysis
- Implement model persistence
- Add data visualization components
For more detailed information and presentation materials, please refer to the presentation: [Fake News Detection Using ML Presentation]
- Both models show strong performance in detecting fake news
- Naive Bayes offers faster training and prediction times
- SVM provides slightly better precision but with longer processing times
- The system demonstrates practical applicability for real-world news classification
Note: This project is for educational purposes and should be used as part of a broader fact-checking strategy.