GitHub - obrunet/Web_Scraping_Projects: Various projects in order to improve my scraping skills :)

2019-10-07 - Car Datasheet (1970-1982) - Basics of scraping using bs4, first try with a notebook, then using a robust script written with pycharm, and finally an exploratory data analysis with pandas and seaborn
2019-10-20 - Pokemon's database - a little more complicated scraping task using bs4, different first tries are made with the notebooks in the directory but the main and final script with pycharm is HERE. Then i've started to make a statisctical analysis but it's uncomplete, i'll finished later if i've time :)
2019-11-07 - Historical climate & meteo data - advanced scraping using different ways and technics to retrieve data with bs4, first tries with a jupyter notebook, the final python script can be found here it makes use of FakeUserAgent to forge requests with random realistic browser's headers. I've also use sleep intervals between requests of a random nb of seconds. Finally, the script is quite robust, all cases are managed, missing values, http errors...
2020-12-15 - Oxford 5000 - using bs4, let's retrieve automatically all the most frequently used English words with their definitions, explanations, examples, their sounds and so on... Here is the jupyter notebook, and the final python script

More ideas:

retrieve news' titles of tabloid or online newspapers
scrape tweets message
get top 250 movies on IMDB
download every link on a specific webpage, display if links are dead
image site downloader (for image search engines)
use of several containers with tor to change IP and retrieve many pages at a time
wikistat.fr all docs
etc...
use of scrapy / selenium

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
2019-10-07-car_datasheet		2019-10-07-car_datasheet
2019-10-20-pokemon_db		2019-10-20-pokemon_db
2019-11-07-meteo		2019-11-07-meteo
2020-12-15-Oxford-most-freq-words		2020-12-15-Oxford-most-freq-words
ideas		ideas
README.md		README.md

Provide feedback