Skip to content

Capstone project using NLP to detect if tweets describe real disasters. Built on 10,000 labeled tweets to support emergency response with AI. Includes a Streamlit app for real-time tweet classification.

Notifications You must be signed in to change notification settings

aletbm/NLP_with_Disaster_Tweets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DataTalks.Club's Capstone 2 project by Alexander D. Rios

πŸŒ‹ Natural Language Processing with Disaster Tweets

Source: EarthDailyAnalytics

KaggleOpen In ColabStreamlit App

This repository was created as part of the DataTalks.Club's Machine Learning Zoomcamp by Alexey Grigorev.

This project has been submitted as the Capstone 1 project for the course.


Overview

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:

The author explicitly uses the word β€œABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.

In this competition, we’re challenged to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. We’ll have access to a dataset of 10,000 tweets that were hand classified.

Disclaimer: The dataset for this competition contains text that may be considered profane, vulgar, or offensive.

Acknowledgments

This dataset was created by the company figure-eight and originally shared on their β€˜Data For Everyone’ website here.

Tweet source: https://twitter.com/AnyOtherAnnaK/status/629195955506708480

What am I predicting?

You are predicting whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.

Kaggle competition

You can find the competition in the following link: https://www.kaggle.com/competitions/nlp-getting-started

This particular challenge is perfect for data scientists looking to get started with Natural Language Processing.


Directory structure

πŸ“‚ NLP_with_Disaster_Tweets_Capstone_2


Downloading the dataset

In this project, I used the following dataset: Natural Language Processing with Disaster Tweets

banner_dataset

You can download it with the following code:

!pip install kagglehub

import kagglehub
kagglehub.login()

nlp_getting_started_path = kagglehub.competition_download('nlp-getting-started')
rtatman_english_word_frequency_path = kagglehub.dataset_download('rtatman/english-word-frequency')
yk1598_479k_english_words_path = kagglehub.dataset_download('yk1598/479k-english-words')
keras_bert_keras_bert_small_en_uncased_2_path = kagglehub.model_download('keras/bert/Keras/bert_small_en_uncased/2')

print('Data source import complete.')

You need to log in with your credentials or username and password. For more help, refer to the KaggleHub repository

Dataset Description

The dataset for training contains 7613 records about tweets.

Each sample in the train and test set has the following information:

  • The text of a tweet
  • A keyword from that tweet (although this may be blank!)
  • The location the tweet was sent from (may also be blank)

Evaluation

Submissions are evaluated using F1 between the predicted and expected answers. F1 is calculated as follows:

$ F_1=2βˆ—\frac{precisionβˆ—recall}{precision+recall}$

where:

  • $precision=\frac{TP}{TP+FP}$
  • $recall=\frac{TP}{TP+FN}$

and:

True Positive [TP] = your prediction is 1, and the ground truth is also 1 - you predicted a positive and that's true!
False Positive [FP] = your prediction is 1, and the ground truth is 0 - you predicted a positive, and that's false.
False Negative [FN] = your prediction is 0, and the ground truth is 1 - you predicted a negative, and that's false.

Dataset analysis and Training models

The dataset analysis and the models training were conducted in Jupyter Notebook. You can find in the file named nlp-with-disaster-tweets.ipynb.

The training script is available in the train.py script.

To deployment I used a model trained for classification named model_base.h5.


Running the project locally

Using Flask

The script to deploy the model using Flask is model_serving.py

Pipfile and Pipfile.lock set up the Pipenv environment.

First, you need to install from Pipfile:

pipenv install

The virtual environment can be activated by running

pipenv shell

Once in the virtual enviroment, you can run the following command:

python ./scripts/model_serving.py

Futhermore, you need to serve the model with the following command:

docker run -it --rm -p 8500:8500 -v "$(pwd)/scripts/disaster_tweets_model:/models/disaster_tweets_model/1" -e MODEL_NAME=disaster_tweets_model tensorflow/serving:2.14.0

Then, you will be ready to test the model by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

Also, you must update the host in the model_serving.py file to:

host = os.getenv('TF_SERVING_HOST', 'localhost:8500')

Using Waitress as WSGI server

Once in the virtual enviroment, you can run the following commands:

waitress-serve --listen=0.0.0.0:9696 scripts.model_serving:app

Before that, you need to serve the model using the following command:

docker run -it --rm -p 8500:8500 -v "$(pwd)/scripts/disaster_tweets_model:/models/disaster_tweets_model/1" -e MODEL_NAME=disaster_tweets_model tensorflow/serving:2.14.0

And then, you can test the model by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

Also, you must update the host in the model_serving.py file to:

host = os.getenv('TF_SERVING_HOST', 'localhost:8500')

Local deployment with Kubernetes, Kind and Docker

To deploy our model with Kubernetes, I aim to ensure that you can create the following structures and connections: deployment

First, you need to build:

  • The TensorFlow Serving image.
  • The Gateway image.

To achieve this, I created two separate Dockerfiles:

To build them, you can use the following commands:

docker build -t tf-serving-disaster-tweets-model -f .etc/serving.dockerfile .

docker build -t serving-gateway-disaster-tweets-model -f ./etc/gateway.dockerfile .

You must tag and push them to your repository:

docker tag tf-serving-disaster-tweets-model <YOUR_USERNAME>/tf-serving-disaster-tweets-model
docker tag serving-gateway-disaster-tweets-model <YOUR_USERNAME>/serving-gateway-disaster-tweets-model

docker push <YOUR_USERNAME>/tf-serving-disaster-tweets-model:latest
docker push <YOUR_USERNAME>/serving-gateway-disaster-tweets-model:latest

You can also pull them from my repository by using the following commands:

docker pull aletbm/tf-serving-disaster-tweets-model:latest

docker pull aletbm/serving-gateway-disaster-tweets-model:latest

To deploy locally using Docker, you must execute the following commands in two separate terminals:

  • To serve the model:
    docker run -it --rm -p 8500:8500 tf-serving-disaster-tweets-model:latest
    
  • To deploy the gateway:
    docker run -it --rm -p 9696:9696 serving-gateway-disaster-tweets-model:latest
    

Then, you will be ready to test the model by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:9696/predict"

To deploy locally using Kubernetes and Docker, you must replace my Docker username with your Docker username in:

  • The model-deployment.yaml file configuration.
        spec:
        containers:
        - name: tf-serving-disaster-tweets-model
            image: <YOUR_USERNAME>/tf-serving-disaster-tweets-model:latest
            ports:
            - containerPort: 8500
    
  • The gateway-deployment.yaml file configuration.
    spec:
      containers:
      - name: serving-gateway-disaster-tweets-model
        image: <YOUR_USERNAME>/serving-gateway-disaster-tweets-model:latest
        ports:
        - containerPort: 9696
    

Up to this point, you have built and pushed all the necessary images, and all configuration files have been corrected.

You need to install Kubernetes and Kind. I won’t explain how to install them, but you can refer to their respective documentation pages.

Now, you need to create a Kubernetes cluster with Kind and apply all configuration files. To do this, you have two options:

  • Do it manually by executing each command individually.
  • Do it automatically by executing the deploy.sh script.

Manually, you must to execute the following commands:

kind create cluster --config kind-config.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml

kubectl apply -f model-deployment.yaml
kubectl apply -f model-service.yaml
kubectl apply -f gateway-deployment.yaml
kubectl apply -f gateway-deployment-service.yaml

kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
kubectl apply -f nginx-ingress.yaml

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.12.1/manifests/metallb.yaml
kubectl apply -f metallb-configmap.yaml
kubectl get pods -n metallb-system --watch

Automatically, you must to execute the following command in a bash terminal:

cd etc

./deploy.sh

Once all pods are running, you can test the deployment by running the following command:

python ./scripts/test.py

Don't forget to update the url variable in the test.py file to:

url = "http://localhost:80/predict"

Testing the deployment

I’ve included a GIF that shows how to perform a deployment:

kubernetes

Streamlit App

On the other hand, I developed a very simple app using Streamlit to deploy my model, where you can upload an image and obtain a prediction.

streamlit.mp4

Here’s the link to my Streamlit App.

About

Capstone project using NLP to detect if tweets describe real disasters. Built on 10,000 labeled tweets to support emergency response with AI. Includes a Streamlit app for real-time tweet classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages