UrlRobotsChecker

Check if a given URL is allowed to be scraped based on the given website's robots.txt rules.

Usage

from url_robots_checker import UrlRobotsChecker

url = 'foobar.com'
url_robots_checker = UrlRobotsChecker(url)
if not url_robots_checker.can_fetch(url):
    print("Not allowed to scrape")

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tests		tests
url_robots_checker		url_robots_checker
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
README.md		README.md
lint.sh		lint.sh
requirements.txt		requirements.txt
run-docker.sh		run-docker.sh
setup.py		setup.py
test.sh		test.sh
test_runner.py		test_runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UrlRobotsChecker

Usage

About

Uh oh!

Releases

Packages

Languages

qcrisw/url-robots-checker

Folders and files

Latest commit

History

Repository files navigation

UrlRobotsChecker

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages