GitHub - codyworsnop/Gender-Bias-Recognition-in-Political-News-Articles

Gender-Bias-Recognition-in-Political-News-Articles is run using Python3 in the following way:

(1) Download the articles at the relevant links. We cannot provide them here, due to copyright concerns. Save the articles in .json format. Use our reader options to randomize the articles, and save the new randomized json in the Data directory as "articles_random_v3.json". You can change the name, but if you do so, you will need to update run_preprocessor.py
(2) run run_preprocessor.py to generate the relevant .json files, including cleaned files, and adjective files

DOC2VEC EMBEDDING TESTS

(3a) Download all the news 2.0 from https://components.one/datasets/all-the-news-2-news-articles-dataset/ to the store directory
(3b) Run run_pretrain_and_finetune.py to replicate our doc2vec embedding tests. This file contains the option to pretrain on dirty or clean all the news data and fine tune on dirty of clean news-bias data. Simply uncomment the line that you wish to run. The default runs the parameters shown in our paper: cleaned all the news data, cleaned news bias data. All of the metrics (precision, recall, f1) are saved to the metrics directory. All of the models are saved to the PretrainFinetuneStorage directory. All TSNE visualizations are saved to the visualizations directory.

BAG OF WORDS TESTS

(4) Run run_BOW.py to replicate our bag of word results. Four options are provided in this file; uncomment the one you would like to replicate, or set a different combination. To replicate the results found when run on all of the words, run OPTION 1 (the default). To replicate the results the results found when run on just adjectives, run OPTION 2. If print_vocab = True, vocabulary can be found in the vocabulary directory.

SENTIMENT ANALYSIS TESTS

(5a) Download the Hu and Liu 2004 sentiment lexicon from http://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar, and remove 'trump', 'vice', and 'right' to remain consistent with our cleaning methods. Save the newly cleaned file as "positive-words-notrump.txt" and "negative-words-novice.txt" in the same folder that they were downloaded in.
(5b) Run run_sentiment.py

Graphs for ALL tests are saved to the visualizations directory.

Models for ALL tests are saved to the store directory.

DATA CLEANING

We use a combination of Bolukbasi et al. (2016) and Zhao et al. (2018) to create our stopwords list. We remove the following terms, as well

The following gendered words have been removed from the Zhao gendered word list due to gender ambiguity in America, their use as a verb, or their lack of usefulness to our application.

Male words including their plural form:
"wizard"
"actor"
"host"
"governor"
"hero"
"deer"
"bull"
"colt"
"gelding"
"waiter"
"sorcerer"
"barbershop"
"dude"
"salesman"
"god"
"lion"

female list (also removed "women" and "woman" forms):

"female_ejaculation"
"hair_salon"
"viagra"
"hen"
"doe"
"filly"
"mare"
"cow"

We add the following words:
"ms."
"misses"
"missus"
"mister"
"gynecologist"

A full list of all stopwords can be found in StopWords.py

Anything with "man" in the word as the regex changes this to "person". I.e. -> Cameraman becomes "cameraperson". OUr full substitution list can be seen in RegexSearchPatterns.py

PACKAGES

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.idea		.idea
Data		Data
Interfaces		Interfaces
Models		Models
PretrainFinetuneMetrics		PretrainFinetuneMetrics
Results		Results
__pycache__		__pycache__
folds		folds
utilities		utilities
vocabulary		vocabulary
.gitignore		.gitignore
ApplicationConstants.py		ApplicationConstants.py
Bow.py		Bow.py
DataContracts.py		DataContracts.py
DataReader.py		DataReader.py
Metrics.py		Metrics.py
Orchestrator.py		Orchestrator.py
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
RegexSearchPatterns.py		RegexSearchPatterns.py
Sentiment.py		Sentiment.py
StopWords.py		StopWords.py
Visualizer.py		Visualizer.py
articles_random_v4_cleaned.json		articles_random_v4_cleaned.json
doc2vec.py		doc2vec.py
imdb_data.py		imdb_data.py
model_orchestration.py		model_orchestration.py
parse_sentences.py		parse_sentences.py
preprocessor.py		preprocessor.py
pretrain_and_finetune.py		pretrain_and_finetune.py
run_BOW.py		run_BOW.py
run_preprocessor.py		run_preprocessor.py
run_pretrain_and_finetune.py		run_pretrain_and_finetune.py
run_sentiment.py		run_sentiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

codyworsnop/Gender-Bias-Recognition-in-Political-News-Articles

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages