A web app based on NLP and supervised machine learning.
Type in your disaster report message and get the categories immediately.
Following a disaster, we will get millions of communications, either directly or through social media platforms. Different organizations will need to take care of different parts of the problem. These organizations have to filter and pull out messages that are most important and relevant to respond immediately.
In this project, I will build an NLP and supervised machine learning model to classify the disaster-related messages into different categories and help different organizations get the messages they need to respond to.
A data set containing real messages that were sent during disaster events provided by Figure Eight.
⭐ pip install these modules
- sys: system-specific parameters and functions
- pandas: data processing
- numpy: linear algebra
- re: regular expressions
- json: JSON encoder and decoder
- sqlalchemy: SQL toolkit
- nltk: natural language processing
- scikit-learn: machine learning
- pickle: save the machine learning model locally
- joblib: load the machine learning model
- flask: web framework
- plotly: front-end visualizations
Run the following commands in the project's root directory to set up the database:
# Run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
After running this, you will have a database file called "DisasterResponse.db" in your data folder.
Run the following commands in the project's root directory to set up the machine learning model.
# Run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
After running this, you will have a pickle file called "classifier.pkl" in your model folder.
🌳 The multi-output classifier is based on Random Forest and has an average accuracy of around 0.9493. The precision is 0.9417, the recall is 0.9493.
Run the following command in the app's directory to run your web app.
python run.py
Then go to http://0.0.0.0:3001/
There's an input box for you to type in disaster-related messages on the main page.
It also shows an overview of the training dataset. From the charts here, we can see that most messages are direct messages or news. Only less than 10% are from social media.
From the categories' distribution, we can see that 76.9% of the messages are tagged as "related," which is a general category that doesn't provide much information.
Besides, many messages are marked as "aid related," "weather-related," and "direct report," meaning the classifier will perform more accurately when classifying messages related. However, there are no records about "child alone." So if you type in a message reporting a child being alone, the model cannot classify it since it never learned about it.
Type in the message to report a disaster problem and click "Classify Message." The app will lead you to a page like this:
If the model classifies your message into some categories, the categories will be highlighted in the "Result" part.
-
app
- template
- master.html # main page of web app
- go.html # classification result page of web app
- run.py # Flask file that runs app
- template
-
data
- disaster_categories.csv # data to process
- disaster_messages.csv # data to process
- process_data.py
- InsertDatabaseName.db # database to save clean data to
-
models
- train_classifier.py
- classifier.pkl # saved model
This project is licensed under MIT License.