Thomas Woodside
This repository contains analysis of electoral results from 1788-1860. It is divided into two main analyses: one of gerrymandering, and the other of the shift from at-large to single-member districts.
Parsing and data wrangling is done in Python, heavily utilizing the Beautiful Soup library. Analysis is mostly done in R.
This analysis was used for a history paper.
The data was digitized from United States Congressional Elections, 1788-1997: The Official Results by Michael J. Dubin (McFarland & Company). The book was scanned and then converted to text through onlineocr.net. I then wrote code contained in parser.py to convert the HTML output to csv.
Substantial effort was put in to ensure that the data from the book was digitized as accurately as possible; however, it is likely that there remain some errors from the OCR used to recognize the text or the program used to convert the text to csv.
The data contains at least partial data for 6,619 elections.
The data contains:
-
The number of votes received by each candidate in regular congressional elections held for the 1st to the 39th congresses.
-
The results of any runoff elections, if applicable.
The data does not contain the results of special elections, as their formatting in the book was simply not regular enough to be parsed programmatically and would probably need to be done by hand.
An analysis of gerrymandering was the original motivation to digitize the data, and is contained in this repository. Please consult gerrymandering.pdf to see the analysis.
The Apportionment Act of 1842 mandated single-member districts for the U.S. House. See multi_member_districts.pdf for possible reasons the mandate was passed.
- example.html is an example of the HTML generated from onlineocr.net.
- filtered.csv contains an extract of the results for each state for each election. You can see how it was generated in gerrymandering.Rmd
- incumbency_analysis.py takes the output from parser.py and attempts to create new rows determining the incumbency of candidates.
- output_with_incumbency.csv contains the raw data obtained from parsing the book. It is not complete, as the book itself had many elections with missing data.
- parse_out_votes.py parses the voting record from vote_record.html into csv.
- parser.py does the majority of the parsing work, parsing HTML like that found in example.html into csv.
- problem_table.html contains an example of a scan that was improperly converted to HTML. It can be reformatted in flatten_table.html.
- state_area.csv contains the current areas of the 50 states. It is not, of course, completely accurate for early American history, particularly in Massachusetts and Virginia.
- vote_record.csv contains the voting record of all representatives who voted on the single-member districting mandate amendment in 1842.
- vote_record.html contains the raw voting records, from the Library of Congress.
Dubin, Michael J. United States Congressional Elections, 1788-1997: The Official Results of the Elections of the 1st through 105th Congresses. Jefferson, North Carolina: McFarland & Company, 1998.
House Journal. 27th Cong., 2nd sess., 3 May 1842, 779.
Calabrese, Stephen. “An Explanation of the Continuing Federal Government Mandate of Single-Member Congressional Districts.” Public Choice 130, no. 1/2 (January 2007): 23-40. JSTOR
Crain, W. Mark. "On the Structure and Stability of Political Markets." Journal of Political Economy 85, no. 4 (Aug., 1977): 829-842. JSTOR.