This repository contains a machine learning project that classifies whether a tweet is about a real disaster or not. It uses natural language processing (NLP) techniques and a trained ML model to make predictions.
To build a classifier that can accurately identify disaster-related tweets, helping authorities, responders, and organizations prioritize responses in real-time.
- Source: Kaggle - Real or Not? NLP with Disaster Tweets
- Fields:
id
: unique identifier for each tweettext
: the content of the tweettarget
: 1 if the tweet refers to a real disaster, 0 if not
- Text cleaning and preprocessing (removing URLs, stopwords, etc.)
- Tokenization and vectorization (TF-IDF or CountVectorizer)
- Machine Learning model (e.g., Logistic Regression, Random Forest, etc.)
- Streamlit web interface for real-time tweet classification
- Python
- Pandas, NumPy
- Scikit-learn
- NLTK / spaCy (for NLP)
- Streamlit (for frontend app)
- Jupyter Notebook (for model development and testing)
- Clone the repository:
git clone https://github.com/Yash-Lade/Disaster-Tweets-Classifier.git
cd Disaster-Tweets-Classifier
- (Optional) Create a virtual environment and activate it:
python -m venv env
source env/bin/activate # or env\Scripts\activate on Windows
- Install dependencies:
pip install -r requirements.txt
- Run the Streamlit app:
streamlit run app.py
- Enter a tweet in the web interface.
- The model will classify whether it is Disaster or Not Disaster.
The model is evaluated using accuracy, precision, recall, and F1-score.
Feel free to fork the repo, make improvements, and create pull requests!
This project is licensed under the MIT License.
Author: Yash Lade