This project aims to recognize whether a given tweet is about a disaster or not. The task is solved using transformers with the AdaBoost technique.
- Data Loading and Preprocessing
- Analyzing and displaying the most popular tweet locations on an interactive map
- Analyzing the class distribution in the training set
- Data processing - removing punctuation, special characters, and emojis
- Utilizing an external file for word vectorization
- Tokenizing tweets and creating word vectors
- Splitting data into training and validation sets
- Implementing the AdaBoost mechanism
- Implementing a transformer-based classifier
- Training the classifier and evaluating the results
After training the classifier, we achieved an accuracy of approximately 80% on the validation set.
To run this project, you need the following libraries and tools:
- pandas
- matplotlib
- numpy
- re
- geopy
- folium
- nltk
- keras
- tensorflow
- sklearn
- livelossplot
The training and test datasets are provided in the files train.csv
and test.csv
. Additionally, you need to download the file glove.twitter.27B.200d.txt
and place it in the project directory.
To run this project, follow these steps:
- Install the required libraries if you haven't already.
- Download the training and test datasets from the appropriate sources and place them in the files
train.csv
andtest.csv
. - Run the Jupyter Notebook file
switch_transformers.ipynb
.