Skip to content

Indexing: robots.txt blocks access to most but not all of the site #151

@m-i-l

Description

@m-i-l

There have been some sites flagged as having some content in the index, but no home page. See #150 and #102 for why this is an issue.

There are at least two sites where the robots.txt blocks access to almost all of the site, but not all of the site, so the site isn't automatically deindexed, but the index doesn't contain any useful content:

Workaround is to manually identify these issues, and manually disable indexing for these sites. Need to think about whether there is a better way of handling. Not sure about ideas at this stage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions