-
-
Notifications
You must be signed in to change notification settings - Fork 391
Description
Deep Learning Simplified Repository (Proposing new issue)
🔴 Project Title : SMS Spam Detection Using NLP
🔴 Aim : Develop an NLP-based model to classify SMS messages as either "ham" (legitimate) or "spam."
🔴 Dataset : https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset/data
🔴 Approach : 1.Exploratory Data Analysis (EDA):
Perform initial data inspection, missing value analysis, and data distribution checks.
Visualize data to understand the frequency of ham vs. spam messages and other relevant patterns.
2. Preprocessing:
Text normalization, including case conversion, stop word removal, punctuation removal, and stemming/lemmatization as needed.
3.Model Building:
Implement multiple algorithms to compare their performance:
Naive Bayes
Support Vector Machine (SVM)
Random Forest
Logistic Regression or any other suitable algorithm
Evaluate models using accuracy scores and select the best-fit algorithm for this dataset.
4. Evaluation & Comparison:
Document accuracy and performance metrics for each model to identify the optimal algorithm for this problem.
📍 Follow the Guidelines to Contribute in the Project :
- You need to create a separate folder named as the Project Title.
- Inside that folder, there will be four main components.
- Images - To store the required images.
- Dataset - To store the dataset or, information/source about the dataset.
- Model - To store the machine learning model you've created using the dataset.
requirements.txt- This file will contain the required packages/libraries to run the project in other machines.
- Inside the
Modelfolder, theREADME.mdfile must be filled up properly, with proper visualizations and conclusions.
🔴🟡 Points to Note :
- The issues will be assigned on a first come first serve basis, 1 Issue == 1 PR.
- "Issue Title" and "PR Title should be the same. Include issue number along with it.
- Follow Contributing Guidelines & Code of Conduct before start Contributing.
✅ To be Mentioned while taking the issue :
- Full name : Pranay Deep Korada
- GitHub Profile Link : https://github.com/pranaydeep1
- Email ID : pranaydeep591@gmail.com
- Participant ID (if applicable):
- Approach for this Project : 1.Exploratory Data Analysis (EDA):
Perform initial data inspection, missing value analysis, and data distribution checks.
Visualize data to understand the frequency of ham vs. spam messages and other relevant patterns.
- Preprocessing:
Text normalization, including case conversion, stop word removal, punctuation removal, and stemming/lemmatization as needed.
3.Model Building:
Implement multiple algorithms to compare their performance:
Naive Bayes
Support Vector Machine (SVM)
Random Forest
Logistic Regression or any other suitable algorithm
Evaluate models using accuracy scores and select the best-fit algorithm for this dataset. - Evaluation & Comparison:
Document accuracy and performance metrics for each model to identify the optimal algorithm for this problem.
- What is your participant role? (Mention the Open Source program) GSSOC ext- Participant
Happy Contributing 🚀
All the best. Enjoy your open source journey ahead. 😎