Skip to content

SMS Spam Detection Using NLP #979

@pranaydeep1

Description

@pranaydeep1

Deep Learning Simplified Repository (Proposing new issue)

🔴 Project Title : SMS Spam Detection Using NLP

🔴 Aim : Develop an NLP-based model to classify SMS messages as either "ham" (legitimate) or "spam."

🔴 Dataset : https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset/data

🔴 Approach : 1.Exploratory Data Analysis (EDA):
Perform initial data inspection, missing value analysis, and data distribution checks.
Visualize data to understand the frequency of ham vs. spam messages and other relevant patterns.
2. Preprocessing:
Text normalization, including case conversion, stop word removal, punctuation removal, and stemming/lemmatization as needed.
3.Model Building:
Implement multiple algorithms to compare their performance:
Naive Bayes
Support Vector Machine (SVM)
Random Forest
Logistic Regression or any other suitable algorithm
Evaluate models using accuracy scores and select the best-fit algorithm for this dataset.
4. Evaluation & Comparison:
Document accuracy and performance metrics for each model to identify the optimal algorithm for this problem.


📍 Follow the Guidelines to Contribute in the Project :

  • You need to create a separate folder named as the Project Title.
  • Inside that folder, there will be four main components.
    • Images - To store the required images.
    • Dataset - To store the dataset or, information/source about the dataset.
    • Model - To store the machine learning model you've created using the dataset.
    • requirements.txt - This file will contain the required packages/libraries to run the project in other machines.
  • Inside the Model folder, the README.md file must be filled up properly, with proper visualizations and conclusions.

🔴🟡 Points to Note :

  • The issues will be assigned on a first come first serve basis, 1 Issue == 1 PR.
  • "Issue Title" and "PR Title should be the same. Include issue number along with it.
  • Follow Contributing Guidelines & Code of Conduct before start Contributing.

To be Mentioned while taking the issue :

  • Full name : Pranay Deep Korada
  • GitHub Profile Link : https://github.com/pranaydeep1
  • Email ID : pranaydeep591@gmail.com
  • Participant ID (if applicable):
  • Approach for this Project : 1.Exploratory Data Analysis (EDA):
    Perform initial data inspection, missing value analysis, and data distribution checks.
    Visualize data to understand the frequency of ham vs. spam messages and other relevant patterns.
  1. Preprocessing:
    Text normalization, including case conversion, stop word removal, punctuation removal, and stemming/lemmatization as needed.
    3.Model Building:
    Implement multiple algorithms to compare their performance:
    Naive Bayes
    Support Vector Machine (SVM)
    Random Forest
    Logistic Regression or any other suitable algorithm
    Evaluate models using accuracy scores and select the best-fit algorithm for this dataset.
  2. Evaluation & Comparison:
    Document accuracy and performance metrics for each model to identify the optimal algorithm for this problem.
  • What is your participant role? (Mention the Open Source program) GSSOC ext- Participant

Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions