This project comprises two main components:
- Training a Classification Model: This component involves training a classification model aiming to predict one categorical feature based on free text input.
- Unsupervised Clustering and Visualization: This component runs an unsupervised model to identify and visualize clusters within the dataset, providing insights into the inherent structure and patterns of the data.
Download the required NLP models from Hugging Face:
Follow the instructions on the respective pages to download and place the models in the appropriate directory.
This project is developed using Python version 3.8.16
Please note that the dataset for which this code was written cannot be made publicly available without authorization by the French data protection authority Commission Nationale de l'Informatique et des Libertés (CNIL). You will need to replace the placeholder with your own dataset before running the code.