Skip to content

This Project focuses on developing and evaluating different text classification models capable of accurately categorizing documents into predefined topics across five domains: business, entertainment, politics, sport, and tech.

Notifications You must be signed in to change notification settings

jotstolu/Text-Classification-Topic-labelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Text-Classification - Topic-labelling

This Project focuses on developing and evaluating different text classification models capable of accurately categorizing documents into predefined topics across five domains: business, entertainment, politics, sport, and tech.

Project Objectives

  1. Exploring various text processing techniques to pre-process the text dataset before Text vectorization/ feature extraction techniques
  2. Utilizing different Text vectorization techniques, such as TF-IDF, Bag of words, and Word Embeddings using pre-trained models
  3. Splitting the dataset into 70% training and 30% testing
  4. Training machine learning classifiers, such as support vector machines (SVMs), Naïve Bayes and Neural Networks using different text vectorization techniques.
  5. Evaluating the performance of the classifiers using appropriate evaluation metrics, such as accuracy, precision, recall, and F1-score, on the test dataset.
  6. Comparing and contrasting machine learning classifiers performance on different Text Vectorization techniques

Background information

The dataset was downloaded from Kaggle website and extracted from its zip folder, it contains five sub-folders – business, entertainment, politics, sport, and tech, which contains various documents relating to each domain.

  • Business - 510
  • Entertainment - 386
  • Politics - 417
  • Sport - 511
  • Tech - 401

About

This Project focuses on developing and evaluating different text classification models capable of accurately categorizing documents into predefined topics across five domains: business, entertainment, politics, sport, and tech.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published