Input a list of terms --> Scrape quizlet for definitions to each term --> Output a markdown file.
I used this program to help automate my 1279 AP US History terms that I had to do for summer homework going into my Junior year of high school.
- Python packages
selenium,numpy,re,random,time, andsysare required to run.
Scrapes quizlet for a list of terms. Uses selenium webdriver. Outputs a markdown file, which can later be converted to a pdf. Run with:
cat terms | awk '{$1=$1};1' | python quizlet_termscraper_webdriver.py driver_address course_name prioritize_definitions_method maximum_number_of_definitions > terms_definitions.md
where:
termsis a list of terms. See samples/terms.txt for reference.driver_addressis the filepath to Selenium Chrome Web Driver. The Chrome Web Driver can be downloaded at (https://chromedriver.chromium.org/downloads). Note that Chrome Driver version must match the version of Chrome installed on the computer. This argument is necessary.course_nameis the name of the course for whichtermsrelates to (in my case, A.P. United States History). Preferably, this argument is provided as the most commonly-used name for the course (APUSH). This argument is necessary (though you could provide it as).prioritize_definitions_methodis the method for prioritizing definitions found on various quizlet sites for a term; eitherlong(to sort longer definitions first) orshort(to sort shorter definitions first). This argument is necessary.maximum_number_of_definitionsis the maximum number of definitions to output per term (-1to output all definitions found). Defaults to-1. This argument is not necessary.terms_definitions.mdis the output file. See samples/terms_definitions.md for reference.
Bash script for running quizlet_termscraper_webdriver.py. Run with:
sh ~/quizlet_termscraper/quizlet_termscraper.sh output_directory software_directory termslist_prefix i course_name driver_address
where:
output_directoryis the directory to output the markdown file.software_directoryis the directory in whichquizlet_termscraper_webdriver.pyis stored.termslist_prefixis the file prefix for the file containing the list of terms.iis an iterator used when running this program on multiple lists of terms.course_nameis the same as the aforementionedcourse_nameargument. The value provided here will be passed ontoquizlet_termscraper_webdriver.py.driver_addressis the same as the aforementioneddriver_addressargument. The value provided here will be passed ontoquizlet_termscraper_webdriver.py.
This program changes the number of definitions displayed in the samples/terms_definitions.md file to something like what is displayed in samples/terms_definitions_filtered.md, which shows only a maximum of three terms. If I previously provided -1 as the maximum_number_of_definitions argument to quizlet_termscraper_webdriver.py, I can use this program to only display the top three definitions. This is useful for when a lot of terms have five or more (too many) definitions to read through. Run with:
cat quizlet_termscraper_output | python ~/quizlet_termscraper/change_maximum_number_of_definitions.py maximum_number_of_definitions > terms_definitions_filtered.md
where:
quizlet_termscraper_outputis the input file; that is, the markdown file that has too many definitions per term at the moment. See samples/terms_definitions.md for reference.maximum_number_of_definitionsis the maximum number of definitions to display per term. I usually set this argument to3.terms_definitions_filtered.mdis the output file. See samples/terms_definitions_filtered.md for reference.
Bash script for running change_maximum_number_of_definitions.py. Using pandoc, this script also converts the markdown files into PDF format using latex; delete line 24 if you want to remove this functionality. Run with:
sh ~/Desktop/Coding/quizlet_termscraper/filter_pdf.sh output_directory software_directory i
where:
output_directoryis the directory to output the filtered markdown file.software_directoryis the directory in whichchange_maximum_number_of_definitions.pyis found.iis an iterator used when running this program on multiple lists of terms.
Bash script to run everything that has been mentioned so far together. Run with:
sh ~/Desktop/Coding/quizlet_termscraper/runner.sh output_directory software_directory termslist_prefix course_name driver_address
where:
output_directoryis the same as the aforementionedoutput_directoryargument.software_directoryis the same as the aforementionedsoftware_directoryargument.termslist_prefixis the same as the aforementionedtermslist_prefixargument (seequizlet_termscraper.shsection).course_nameis the same as the aforementionedcourse_nameargument.driver_addressis the same as the aforementioneddriver_addressargument.
Hopefully you find this program as useful as I did. Best of luck!