Link_crawler

bs_and_requests.py is a simple single spider, a basic crawler. It just displays the links and does not attempt to queue and crawl the links. Rest of the files are the multi threaded spider crawler. When the main.py file is run it creates and stores links in two text files "queued.txt" and "crawled.txt". There seems to be a problem with the implementation of multi threading, the crawling freezes after some time, should investigate

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
bs_and_requests.py		bs_and_requests.py
domain.py		domain.py
formscrapper.py		formscrapper.py
general.py		general.py
link_finder.py		link_finder.py
main.py		main.py
spider.py		spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Link_crawler

About

Uh oh!

Releases

Packages

Languages

mathav95raj/Link_crawler

Folders and files

Latest commit

History

Repository files navigation

Link_crawler

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages