Skip to content

ramachandra742/document-classification-ML

Repository files navigation

Document-Classification

This project aims to classify documents based their categories in the corpus. There are 12 categores in the corpus viz.,
'books', 'cinema','cooking', 'gaming', 'sports', 'tech', 'data_science', 'design', 'news', 'politics', 'do_it_yourself', & 'business'. Logistic regression, SVC, & MultinomialNB algorithms are used for classifying documents.

Installation

Clone this repo:

git clone git@github.com:ramachandra742/Document-Classification.git

Download dataset

Check here

References

Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning Book by Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda