NLP Sentiment Analysis Project

This project demonstrates a sentiment analysis pipeline using various natural language processing (NLP) techniques and machine learning models. The primary objective is to classify movie reviews as either Positive or Negative using the following algorithms:

Bag of Words (BoW)
TF-IDF
N-gram
Word2Vec
FastText

Each model is trained to predict sentiment, and a final output is decided by taking the majority vote from the different models.

Dataset

The dataset used in this project is the IMDB Movie Reviews Dataset, consisting of 50,000 movie reviews labeled as positive or negative.

Download the Dataset

You can download the dataset from Kaggle:

IMDB Dataset on Kaggle

Place the dataset in your working directory or Google Drive to use in Colab.

Models Implemented

Bag of Words (BoW): Converts text to feature vectors using word counts.
TF-IDF: Text vectorization based on term frequency and inverse document frequency.
N-gram: Uses word pairs (bigrams) as features to capture context.
Word2Vec: Embedding model that learns vector representations of words based on their usage context.
FastText: Similar to Word2Vec but captures sub-word information, making it useful for morphologically rich languages.

How It Works

The dataset is preprocessed by tokenizing, lowercasing, and removing stop words.
Each model is trained on the preprocessed data.
The input text is passed through each model to predict sentiment (positive or negative).
The final sentiment is determined by majority voting from the models' predictions.

File Structure

├── nlp.py            # Main script for training models and running predictions
├── IMDB Dataset.csv   # Dataset (ensure it's downloaded and available)
└── README.md          # This file

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
SentimentAnalysis.ipynb		SentimentAnalysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP Sentiment Analysis Project

Dataset

Download the Dataset

Models Implemented

How It Works

File Structure

About

Uh oh!

Releases

Packages

Languages

surajmohityadav/Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

NLP Sentiment Analysis Project

Dataset

Download the Dataset

Models Implemented

How It Works

File Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages