Skip to content

rohit-khoiwal/Covid19-Sentiment-Analysis

Repository files navigation

Covid19-Sentiment-Analysis of Unlabelled Tweets

Introduction

Sentiment analysis is the process of determining whether a piece of writing is positive, negative, or neutral. Sentiment Analysis is a Natural Language Processing technique. A branch of linguistics, computer science, and artificial intelligence called "natural language processing" (NLP) studies how computers and human language interact, with a focus on how to train computers to process and analyse massive volumes of natural language data. The ultimate goal is to create a machine that is able to "understand" the contents of documents, including the subtle subtleties of language used in different contexts. Once the information and insights are accurately extracted from the documents, the technology can classify and arrange the documents themselves. Read More

Getting Started

Open a terminal in a specific folder. Then execute the following command one at a time.

  git clone https://github.com/rohit-khoiwal-30/Covid19-Sentiment-Analysis.git
  cd Covid19-Sentiment-Analysis
  virtualenv env
  env\scripts\activate
  pip install -r requirements.txt
  cd server
  flask run

Open app folder and double-click index.html and enjoy the app.

Problem-Statement:

The topic of Covid-19 is covered in a sizable corpus of tweets on Twitter. We wish to categorise how many people have good and negative views about the COVID-19 epidemic.

Solution :

  1. We download Twitter's raw tweets into our system. Hydrate Tweets
  2. We must clean and preprocess tweets before using it.
  3. Than Tweet features can be extracted using the tf-idf vectorizer.
  4. Since the data is unlabeled, we must somehow label it in order to use supervised learning. Go to the notebook.
  5. After labelling, we extract features using countvectorizer from Sklearn.
  6. For classification, we employ a naïve bayes classifier.

Conclusion:

Using an unsupervised learning technique, we labelled the data and classified based on that. Because we trained the model on 1 lakh tweets, we obtained quite decent accuracy but had a few small mistakes in identifying tweets.

Refernces:

  1. We make use of a github repository's dataset. Here
  2. Preprocessing and Feature Extraction. Here

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •