Skip to content

CivilianRebel/python-simple-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-simple-crawler

Simple web crawler made in Python using BS4 and SQLite3

Requirements

  • BeautifulSoup
    • pip install --upgrade bs4
    • What better parser is out there anyways??
  • TLD
    • pip install --upgrade tld
    • For easy domain extraction, and laziness
  • (optional) LXML
    • pip install --upgrade lxml
    • Good and fast parser. If you end up using something else change the line in the find_links function that defines BeautifulSoup to match your parser.

About

Simple infinite web crawler made in Python using BS4 and SQLite3

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages