This project aims to classify documents based their categories in the corpus. There are 12 categores in the corpus viz.,
'books', 'cinema','cooking', 'gaming', 'sports', 'tech', 'data_science', 'design', 'news', 'politics', 'do_it_yourself', & 'business'.
Logistic regression, SVC, & MultinomialNB algorithms are used for classifying documents.
Clone this repo:
git clone git@github.com:ramachandra742/Document-Classification.git
Check here
Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning Book by Benjamin Bengfort, Rebecca Bilbro, and Tony Ojeda