🌋 Natural Language Processing with Disaster Tweets

DataTalks.Club's Capstone 2 project by Alexander D. Rios

🌋 Natural Language Processing with Disaster Tweets

This repository was created as part of the DataTalks.Club's Machine Learning Zoomcamp by Alexey Grigorev.

This project has been submitted as the Capstone 1 project for the course.

Overview

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:

The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.

In this competition, we’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. We’ll have access to a dataset of 10,000 tweets that were hand classified.

Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.

Acknowledgments

This dataset was created by the company figure-eight and originally shared on their ‘Data For Everyone’ website here.

Tweet source: https://twitter.com/AnyOtherAnnaK/status/629195955506708480

What am I predicting?

You are predicting whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.

Kaggle competition

You can find the competition in the following link: https://www.kaggle.com/competitions/nlp-getting-started

This particular challenge is perfect for data scientists looking to get started with Natural Language Processing.

Directory structure

📂 NLP_with_Disaster_Tweets_Capstone_2

📄 README.md
📄 Pipfile
📄 Pipfile.lock
📂 analysis
- 📊 nlp-with-disaster-tweets.ipynb
- 📊 NLP_with_Disaster_Tweets-Colab.ipynb
- 📖 NLP_with_Disaster_Tweets-Colab.pdf
- 📊 test_deploy.ipynb
📂 dataset
- 📂 479k-english-words
  - 🗄️ english_words_479k.txt
- 📂 english-word-frequency
  - 🗄️ unigram_freq.csv
- 📂 nlp-getting-started
  - 🗄️ sample_submission.csv
  - 🗄️ test.csv
  - 🗄️ train.csv
📂 etc
- 📄 deploy.sh
- ⚙️ gateway-deployment-service.yaml
- ⚙️ gateway-deployment.yaml
- 🐋 gateway.dockerfile
- ⚙️ kind-config.yaml
- ⚙️ metallb-configmap.yaml
- ⚙️ model-deployment.yaml
- ⚙️ model-service.yaml
- ⚙️ nginx-ingress.yaml
- 🐋 serving.dockerfile
📂 models
- 🤖 model_base.h5
- 🤖 tokenizer.bin
📂 scripts
- 🐍 load_data.py
- 🐍 model_conversor.py
- 🐍 model_serving.py
- 🐍 test.py
- 🐍 train.py
- 🐍 utils.py
- 📂 disaster_tweets_model
  - 📄 fingerprint.pb
  - 📄 saved_model.pb
  - 📂 variables
    - 📄 variables.data-00000-of-00001
    - 📄 variables.index
📂 streamlit_app
- 🐍 my_app.py
- 📄 requirements.txt
- 🗄️ train_clean.csv

Downloading the dataset

In this project, I used the following dataset: Natural Language Processing with Disaster Tweets

You can download it with the following code:

!pip install kagglehub

import kagglehub
kagglehub.login()

nlp_getting_started_path = kagglehub.competition_download('nlp-getting-started')
rtatman_english_word_frequency_path = kagglehub.dataset_download('rtatman/english-word-frequency')
yk1598_479k_english_words_path = kagglehub.dataset_download('yk1598/479k-english-words')
keras_bert_keras_bert_small_en_uncased_2_path = kagglehub.model_download('keras/bert/Keras/bert_small_en_uncased/2')

print('Data source import complete.')

You need to log in with your credentials or username and password. For more help, refer to the KaggleHub repository

Dataset Description

The dataset for training contains 7613 records about tweets.

Each sample in the train and test set has the following information:

The text of a tweet
A keyword from that tweet (although this may be blank!)
The location the tweet was sent from (may also be blank)

Evaluation

Submissions are evaluated using F1 between the predicted and expected answers. F1 is calculated as follows:

$ F_1=2∗\frac{precision∗recall}{precision+recall}$

where:

$precision=\frac{TP}{TP+FP}$
$recall=\frac{TP}{TP+FN}$

and:

True Positive [TP] = your prediction is 1, and the ground truth is also 1 - you predicted a positive and that's true!
False Positive [FP] = your prediction is 1, and the ground truth is 0 - you predicted a positive, and that's false.
False Negative [FN] = your prediction is 0, and the ground truth is 1 - you predicted a negative, and that's false.

Dataset analysis and Training models

The dataset analysis and the models training were conducted in Jupyter Notebook. You can find in the file named nlp-with-disaster-tweets.ipynb.

The training script is available in the train.py script.

To deployment I used a model trained for classification named model_base.h5.

Running the project locally

Using Flask

The script to deploy the model using Flask is model_serving.py

Pipfile and Pipfile.lock set up the Pipenv environment.

First, you need to install from Pipfile:

pipenv install

The virtual environment can be activated by running

pipenv shell

Once in the virtual enviroment, you can run the following command:

python ./scripts/model_serving.py

Futhermore, you need to serve the model with the following command:

docker run -it --rm -p 8500:8500 -v "$(pwd)/scripts/disaster_tweets_model:/models/disaster_tweets_model/1" -e MODEL_NAME=disaster_tweets_model tensorflow/serving:2.14.0

Then, you will be ready to test the model by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

Also, you must update the host in the model_serving.py file to:

host = os.getenv('TF_SERVING_HOST', 'localhost:8500')

Using Waitress as WSGI server

Once in the virtual enviroment, you can run the following commands:

waitress-serve --listen=0.0.0.0:9696 scripts.model_serving:app

Before that, you need to serve the model using the following command:

docker run -it --rm -p 8500:8500 -v "$(pwd)/scripts/disaster_tweets_model:/models/disaster_tweets_model/1" -e MODEL_NAME=disaster_tweets_model tensorflow/serving:2.14.0

And then, you can test the model by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

Also, you must update the host in the model_serving.py file to:

host = os.getenv('TF_SERVING_HOST', 'localhost:8500')

Local deployment with Kubernetes, Kind and Docker

To deploy our model with Kubernetes, I aim to ensure that you can create the following structures and connections:

First, you need to build:

The TensorFlow Serving image.
The Gateway image.

To achieve this, I created two separate Dockerfiles:

serving.dockerfile -- Contains the instruction to serve the TensorFlow model in saved_model format (disaster_tweets_model).
gateway.dockerfile -- Contains the instruction to deploy the model_serving.py algorithm and install its dependencies.

To build them, you can use the following commands:

docker build -t tf-serving-disaster-tweets-model -f .etc/serving.dockerfile .

docker build -t serving-gateway-disaster-tweets-model -f ./etc/gateway.dockerfile .

You must tag and push them to your repository:

docker tag tf-serving-disaster-tweets-model <YOUR_USERNAME>/tf-serving-disaster-tweets-model
docker tag serving-gateway-disaster-tweets-model <YOUR_USERNAME>/serving-gateway-disaster-tweets-model

docker push <YOUR_USERNAME>/tf-serving-disaster-tweets-model:latest
docker push <YOUR_USERNAME>/serving-gateway-disaster-tweets-model:latest

You can also pull them from my repository by using the following commands:

docker pull aletbm/tf-serving-disaster-tweets-model:latest

docker pull aletbm/serving-gateway-disaster-tweets-model:latest

To deploy locally using Docker, you must execute the following commands in two separate terminals:

To serve the model:

docker run -it --rm -p 8500:8500 tf-serving-disaster-tweets-model:latest

To deploy the gateway:

docker run -it --rm -p 9696:9696 serving-gateway-disaster-tweets-model:latest

Then, you will be ready to test the model by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

To deploy locally using Kubernetes and Docker, you must replace my Docker username with your Docker username in:

The model-deployment.yaml file configuration.

    spec:
    containers:
    - name: tf-serving-disaster-tweets-model
        image: <YOUR_USERNAME>/tf-serving-disaster-tweets-model:latest
        ports:
        - containerPort: 8500

The gateway-deployment.yaml file configuration.

spec:
  containers:
  - name: serving-gateway-disaster-tweets-model
    image: <YOUR_USERNAME>/serving-gateway-disaster-tweets-model:latest
    ports:
    - containerPort: 9696

Up to this point, you have built and pushed all the necessary images, and all configuration files have been corrected.

You need to install Kubernetes and Kind. I won’t explain how to install them, but you can refer to their respective documentation pages.

Now, you need to create a Kubernetes cluster with Kind and apply all configuration files. To do this, you have two options:

Do it manually by executing each command individually.
Do it automatically by executing the deploy.sh script.

Manually, you must to execute the following commands:

kind create cluster --config kind-config.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml

kubectl apply -f model-deployment.yaml
kubectl apply -f model-service.yaml
kubectl apply -f gateway-deployment.yaml
kubectl apply -f gateway-deployment-service.yaml

kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
kubectl apply -f nginx-ingress.yaml

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/metallb.yaml
kubectl apply -f metallb-configmap.yaml
kubectl get pods -n metallb-system --watch

Automatically, you must to execute the following command in a bash terminal:

cd etc

./deploy.sh

Once all pods are running, you can test the deployment by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:80/predict"

Testing the deployment

I’ve included a GIF that shows how to perform a deployment:

Streamlit App

On the other hand, I developed a very simple app using Streamlit to deploy my model, where you can upload an image and obtain a prediction.

streamlit.mp4

Here’s the link to my Streamlit App.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DataTalks.Club's Capstone 2 project by Alexander D. Rios

🌋 Natural Language Processing with Disaster Tweets

Overview

Acknowledgments

What am I predicting?

Kaggle competition

Directory structure

📂 NLP_with_Disaster_Tweets_Capstone_2

Downloading the dataset

Dataset Description

Evaluation

Dataset analysis and Training models

Running the project locally

Using Flask

Using Waitress as WSGI server

Local deployment with Kubernetes, Kind and Docker

Testing the deployment

Streamlit App

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
analysis		analysis
dataset		dataset
etc		etc
models		models
scripts		scripts
src		src
streamlit_app		streamlit_app
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

aletbm/NLP_with_Disaster_Tweets

Folders and files

Latest commit

History

Repository files navigation

DataTalks.Club's Capstone 2 project by Alexander D. Rios

🌋 Natural Language Processing with Disaster Tweets

Overview

Acknowledgments

What am I predicting?

Kaggle competition

Directory structure

📂 NLP_with_Disaster_Tweets_Capstone_2

Downloading the dataset

Dataset Description

Evaluation

Dataset analysis and Training models

Running the project locally

Using Flask

Using Waitress as WSGI server

Local deployment with Kubernetes, Kind and Docker

Testing the deployment

Streamlit App

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages