In this file, you can write anything relevant to the work we are doing.
- Readability: Your code will be read by everyone in this group, and as such, we should all strive for good code readability. Explaining most of what you did on your code with comments will help.
- Modularity: Break down your code into smaller, reusable modules or functions to promote reusability and maintainability.
- Consistency: Follow consistent coding style and naming conventions to ensure uniformity across the project.
- Individual performance of the data handling on a separate jupyter notebook (due 2023/10/29)
- Selection of the best of each work (due 2023/10/5)
- Exporting the processed data to the Data folder to continue to the binary classification (due 2023/10/5)
- Building the models (no due date yet)
- Writing the report (no due date yet)
- Delivery of the work (2023/12/22)
1. Handling the data:
- Data Exploration: Describe the data and extract meaningful insights that you consider helpful. Avoid adding visualizations and elements that add nothing to address the problem at hand.
- Initial data preprocessing: This section covers the initial preprocessing of your data. In essence, it should unambiguously explain the steps and rationale behind your steps in transforming the data into data usable by your predictive models.
2. Predicting:
-
Binary Classification: Describe your strategy for the text classification objective. This section is separated into different components:
- Kaggle Performance
- Additional Preprocessing (includes feature selection)
- Modelling approach (model assessment (holdout, cross-validation, etc...), algorithms used)
- Performance assessment (choice of metrics and interpretation of results)
-
Multiclass Classification: Describe your strategy for the multiclass classification objective. This section is separated into different components:
- Additional Preprocessing (includes feature selection): 1v
- Modelling approach (model assessment (holdout, cross-validation, etc...), algorithms used): 1v
- Performance assessment (choice of metrics and interpretation of results): 2v