Here we are going to solve a multi-label classification problem.
Tasks:
- Downloading the dataset from Kaggle and loading the corpus into dataframe.
- Pre-processing the textual data.
- Building a model that will predict the features of the author.
The dataset we are going to work on can be found here: https://www.kaggle.com/datasets/rtatman/blog-authorship-corpus
.
├── data # data files location
│ ├── final # Store final clean and grouped data here.
│ ├── processed # Store processed files here, intermediate files.
│ └── raw # Store raw data files here.
├── docs # Store your project related documents here, such as project report, ideas
├── models # Store the models here
├── notebooks # Store python notebooks here
├── src # Source files
├── tests # Automated tests (alternatively `spec` or `tests`)
└── README.md # A instructions file.
Contributors:
Rahul Thorat
Bharat Singh Rajpurohit
Anisha Birje