Skip to content

obrunet/Web_Scraping_Projects

Repository files navigation

  • 2019-10-07 - Car Datasheet (1970-1982) - Basics of scraping using bs4, first try with a notebook, then using a robust script written with pycharm, and finally an exploratory data analysis with pandas and seaborn

  • 2019-10-20 - Pokemon's database - a little more complicated scraping task using bs4, different first tries are made with the notebooks in the directory but the main and final script with pycharm is HERE. Then i've started to make a statisctical analysis but it's uncomplete, i'll finished later if i've time :)

  • 2019-11-07 - Historical climate & meteo data - advanced scraping using different ways and technics to retrieve data with bs4, first tries with a jupyter notebook, the final python script can be found here it makes use of FakeUserAgent to forge requests with random realistic browser's headers. I've also use sleep intervals between requests of a random nb of seconds. Finally, the script is quite robust, all cases are managed, missing values, http errors...

  • 2020-12-15 - Oxford 5000 - using bs4, let's retrieve automatically all the most frequently used English words with their definitions, explanations, examples, their sounds and so on... Here is the jupyter notebook, and the final python script


More ideas:

  • retrieve news' titles of tabloid or online newspapers
  • scrape tweets message
  • get top 250 movies on IMDB
  • download every link on a specific webpage, display if links are dead
  • image site downloader (for image search engines)
  • use of several containers with tor to change IP and retrieve many pages at a time
  • wikistat.fr all docs
  • etc...
  • use of scrapy / selenium

About

Various projects in order to improve my scraping skills :)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published