Skip to content

A Web Crawler which crawls the webpage in BFS form and returns the depth from origin ,most frequent word and number of valid external links on the page

Notifications You must be signed in to change notification settings

GaganpreetKhurana/WebCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

WebCrawler

A Web Crawler which crawls the webpage in BFS order and returns the depth from origin ,most frequent word and number of valid external links on the page

Instructions:

  1. Install Beautiful Soup using pip install beautifulsoup4
  2. Run BFS_Crawler.py using python BFS_Crawler.py
  3. Enter a valid url starting with http:// or https://
  4. Enter an integer for specifying max limit for number of pages to be crawled
  5. Enter an integer for specifying timeout (in seconds)

Result will pe printed on the console and a log.txt file will be generated

About

A Web Crawler which crawls the webpage in BFS form and returns the depth from origin ,most frequent word and number of valid external links on the page

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages