forked from egoist/sitefetch
-
-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
I'm loving using this MCP. But I sometimes feel it couldn't get all of docs from sites.
Problem
Currently, this discovers pages by extracting links from HTML <a>
tags during the crawling process. This approach works well but can be inefficient for large sites and may miss pages that aren't linked from other pages.
Some sites (like https://nextjs.org) using sitemap.xml
so I think using this can be more efficient site crawling.
Proposed Enhancement
- Automatically detect sitemap.xml at common locations (
/sitemap.xml
, or it referenced in/robots.txt
) - Add configuration option - allow users to enable/disable sitemap usage via CLI flag
--sitemap=/sitemap.xml
- Parse XML structure to extract all URLs listed in the sitemap
If you like this proposal, I will work for this. What's your opinion?
bgoewert
Metadata
Metadata
Assignees
Labels
No labels