Sentiment_Analysis_Multilingual_Corpora

A generic approach to the supervised sentiment analysis of social media content in foreign languages.

The method proposes translating the documents from the original language to English with Google's Neural Translation Model. The resulted texts are then converted to vectors by averaging the vectorial representation of words. Testing the approach with several machine learning classifiers on Swedish, Polish, Slovenian and Croatian Twitter datasets returns up to 86% of classification accuracy on out-of-sample data.

Preprocessing steps include:

data source: pre-labelled tweets in Croatian from CLARIN.SI. You can learn about the approach of the tweets annotation from Mozetič, I., Grčar, M., & Smailović, J. (2016).
translation: tweets are translated with Cloud Translation API in Python with google_api_translate library more: Translate
data preprocessing: removing links, punctuation, digits, emojis and lowering the words.

The methodological set-up is as follows:

You can skip first 4 steps by using pre-calculated vectors: vectors_croatian.csv.

The results with different methods are as in the Table:

model	accuracy	time
DT	0.749064	0.887692
RF	0.863296	13.895610
SVM	0.861423	11.448037
KNN	0.794007	0.759187
LRM	0.846442	0.057601
ANN	0.855805	2.007087

The methodology is well explained in our paper: Galeshchuk S., Jourdan J., Qiu J. Sentiment Analysis for Multilingual Corpora. 2019. Accepted to the workshop of ACL'2019. Please cite if you use the method.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
Croatian_SA.ipynb		Croatian_SA.ipynb
LICENSE		LICENSE
README.md		README.md
fig1.png		fig1.png
vectors_croatian.csv		vectors_croatian.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sentiment_Analysis_Multilingual_Corpora

About

Uh oh!

Releases

Packages

Languages

License

GSukr/Sentiment_Analysis_Multilingual_Corpora

Folders and files

Latest commit

History

Repository files navigation

Sentiment_Analysis_Multilingual_Corpora

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages