Classification in Computational Social Science

Created and maintained by Honglin Bao, summer 2021 @ Michigan State Department of Communication, Computational Communication Group. Contact: baohlcs@gmail.com

Computational social science research necessitates the processing of massive amounts of textual data ranging from digital traces for social media research to publication data for Science of Science research. This GitHub repository will provide an overview of the most frequently used techniques in computational social science (notably political communication) for dealing with textual data: scraping to obtain datasets, pre-processing to clean the data, and finally, automatic classification.

I cover the following subjects:

Scrapers: API-based or manually constructed tools for scraping websites or social media platforms such as Twitter/YouTube (check out the corresponding folder).
Binary classification of Twitter posts to infer their ideology (republican or democrat) (check out the corresponding folder).
Classification of social media comments into multiple classes to determine their toxicity degrees or sentiments (check out the corresponding folder).
Several advanced techniques for dealing with unusual situations, such as insufficient text data or imbalanced text data across classes (refer to slides).
Model evaluation: What metrics should we consider when evaluating a designed machine learning model? (refer to slides).
A brief introduction to some fancy, famous, but heavy-weight deep learning models that have the potential to achieve highly accurate text classification performance (refer to slides).

Nota bene, 1, 2, and 3 are basic operations with accompanying code and detailed comments/explanations. 4, 5, and 6 are more advanced subjects with a substantial body of literature. Please refer to the slides for details.

Acknowledgment: The Summer Institutes in Computational Social Science 2021 (https://sicss.io/)

Appreciate and welcome any types of contribution/discussion/pulling requests.

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
binary ideology classification		binary ideology classification
multiclass		multiclass
smart_scraper		smart_scraper
README.md		README.md
classification.pdf		classification.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classification in Computational Social Science

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hlbao/classification_in_CSS

Folders and files

Latest commit

History

Repository files navigation

Classification in Computational Social Science

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages