Skip to content

ZhaoyiW/Web-App-Disaster-Messages-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disaster-Response License: MIT

A web app based on NLP and supervised machine learning.
Type in your disaster report message and get the categories immediately.

Content

Project Overview

Background

Following a disaster, we will get millions of communications, either directly or through social media platforms. Different organizations will need to take care of different parts of the problem. These organizations have to filter and pull out messages that are most important and relevant to respond immediately.

Project Goal

In this project, I will build an NLP and supervised machine learning model to classify the disaster-related messages into different categories and help different organizations get the messages they need to respond to.

Installations

Data Source

A data set containing real messages that were sent during disaster events provided by Figure Eight.

Modules

pip install these modules

  • sys: system-specific parameters and functions
  • pandas: data processing
  • numpy: linear algebra
  • re: regular expressions
  • json: JSON encoder and decoder
  • sqlalchemy: SQL toolkit
  • nltk: natural language processing
  • scikit-learn: machine learning
  • pickle: save the machine learning model locally
  • joblib: load the machine learning model
  • flask: web framework
  • plotly: front-end visualizations

Data Processing and Database Building

Run the following commands in the project's root directory to set up the database:

# Run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db   

After running this, you will have a database file called "DisasterResponse.db" in your data folder.

NLP and Machine Learning Pipeline

Run the following commands in the project's root directory to set up the machine learning model.

# Run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl

After running this, you will have a pickle file called "classifier.pkl" in your model folder.
🌳 The multi-output classifier is based on Random Forest and has an average accuracy of around 0.9493. The precision is 0.9417, the recall is 0.9493.

Run the App Locally

Run the following command in the app's directory to run your web app.

python run.py

Then go to http://0.0.0.0:3001/ ↙️

Web App Overview

The Interface

There's an input box for you to type in disaster-related messages on the main page.
It also shows an overview of the training dataset. From the charts here, we can see that most messages are direct messages or news. Only less than 10% are from social media.

From the categories' distribution, we can see that 76.9% of the messages are tagged as "related," which is a general category that doesn't provide much information.
Besides, many messages are marked as "aid related," "weather-related," and "direct report," meaning the classifier will perform more accurately when classifying messages related. However, there are no records about "child alone." So if you type in a message reporting a child being alone, the model cannot classify it since it never learned about it.

How to use it?

Type in the message to report a disaster problem and click "Classify Message." The app will lead you to a page like this: If the model classifies your message into some categories, the categories will be highlighted in the "Result" part.

File Description

  • app

    • template
      • master.html # main page of web app
      • go.html # classification result page of web app
    • run.py # Flask file that runs app
  • data

    • disaster_categories.csv # data to process
    • disaster_messages.csv # data to process
    • process_data.py
    • InsertDatabaseName.db # database to save clean data to
  • models

    • train_classifier.py
    • classifier.pkl # saved model

License

This project is licensed under MIT License.

Author

Zhaoyi Wang

About

Udacity data engineer project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published