-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
enhancementNew feature or requestNew feature or request
Description
There have been some sites flagged as having some content in the index, but no home page. See #150 and #102 for why this is an issue.
There are at least two sites where the robots.txt blocks access to almost all of the site, but not all of the site, so the site isn't automatically deindexed, but the index doesn't contain any useful content:
- https://mike-burns.com/robots.txt disallows indexing, but https://keys1.mike-burns.com/keys.atom is allowed and is the 1 document in the index
- https://lostletters.neocities.org/robots.txt allows indexing the feed at https://lostletters.neocities.org/feed.xml but disallows everything else including the home and the pages the feed points to so that https://lostletters.neocities.org/feed.xml becaomes the 1 document in the index.
Workaround is to manually identify these issues, and manually disable indexing for these sites. Need to think about whether there is a better way of handling. Not sure about ideas at this stage.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request