[web-crawler] robots.txt Politness

## Summary
Implement a feature in the web crawler that automatically discovers, fetches, parses, and enforces the rules specified in a website’s robots.txt file before crawling any URLs from that domain

This includes respecting Disallow, Allow, and Crawl-delay directives, and ensuring that the crawler does not access or queue URLs that are forbidden by the site's robots.txt policy. The crawler should cache robots.txt files

## Affected Area(s)
**Apps:**
- [ ] Url Shortener (apps/url-shortener)
- [X] Web Crawler (apps/web-crawler)


**Libraries:**
- [ ] Shared (libs/shared)

**Other:**
- [ ] Other (please specify):


## Motivation
Respecting robots.txt prevents overloading servers and avoids crawling restricted areas, aligning with industry best practices and ethical standards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[web-crawler] robots.txt Politness #6

Summary

Affected Area(s)

Motivation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[web-crawler] robots.txt Politness #6

Description

Summary

Affected Area(s)

Motivation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions