Welcome to the SMS/Email Spam Classifier project! This project is designed to help you identify whether a given SMS or email message is spam or not, using a machine learning model that I built and deployed as a Streamlit app.
- Data Cleaning: The data was cleaned to ensure that only relevant and high-quality information was fed into the model.
- Exploratory Data Analysis (EDA): In-depth analysis of the data was performed to understand patterns and distributions.
- Text Preprocessing: Tokenization, stemming, and stopwords removal were applied to transform the raw text.
- Model Building: Various models were tested, and the Multinomial Naive Bayes model achieved 100% precision.
- Model Improvement: TF-IDF with a maximum of 3000 features was used to enhance accuracy.
- Model Training: The model was trained and fine-tuned for optimal performance.
- Streamlit App Development: A user-friendly web application was built using Streamlit.
- Deployment: The app was deployed and is available for the community.
- Real-time Classification: Instantly classify messages as "Spam" or "Not Spam".
- High Precision: The model is fine-tuned to achieve 100% precision.
- User-Friendly Interface: The Streamlit app is intuitive and easy to use.
- Community Access: The app is deployed and available for public use.
- Clone the repository:
- Install the required packages:
- Run the Streamlit app:
git clone https://github.com/imAdnanSaid/sms-classifier.git
cd sms-classifier
pip install -r requirements.txt
streamlit run app.py
- Python: The primary programming language.
- Pandas & NumPy: For data manipulation and analysis.
- Scikit-learn: For building and evaluating the machine learning model.
- NLTK: For text preprocessing.
- Streamlit: For building and deploying the web application.
The Multinomial Naive Bayes model, improved with TF-IDF vectorization, achieved remarkable performance, particularly in precision, making it highly reliable for spam detection.
The app is deployed on Streamlit and accessible here.
Contributions are welcome! If you have suggestions for improvements or new features, feel free to open an issue or submit a pull request.
- Special thanks to the Streamlit community for their support and resources.
- The datasets used for training the model were sourced from open data repositories.