feat: Add `sitemap.xml` support for efficient site discovery

I'm loving using this MCP. But I sometimes feel it couldn't get all of docs from sites.

# Problem

Currently, this discovers pages by extracting links from HTML `<a>` tags during the crawling process. This approach works well but can be inefficient for large sites and may miss pages that aren't linked from other pages.

Some sites (like https://nextjs.org) using `sitemap.xml` so I think using this can be more efficient site crawling.

## Proposed Enhancement

1. Automatically detect sitemap.xml at common locations ( `/sitemap.xml` , or it referenced in `/robots.txt` )
2. Add configuration option - allow users to enable/disable sitemap usage via CLI flag `--sitemap=/sitemap.xml`
3. Parse XML structure to extract all URLs listed in the sitemap

---

If you like this proposal, I will work for this. What's your opinion?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Add `sitemap.xml` support for efficient site discovery #19

Problem

Proposed Enhancement

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

feat: Add sitemap.xml support for efficient site discovery #19

Description

Problem

Proposed Enhancement

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

feat: Add `sitemap.xml` support for efficient site discovery #19