Clarify policy of the kinds of agents we block

I'd like the README to be more explicit about the kinds of bots that we block.

This issue was raised while reviewing https://github.com/ai-robots-txt/ai.robots.txt/pull/126. In this particular case, I couldn't find a bot listed for training Mistral AI's LLM, so it's not the case that we only block bots associated with AI companies which have AI training bots that we also block.

The current statement:

> This is an open list of web crawlers associated with AI companies **and** the training of LLMs to block.

(emphasis added)

seems a little too restrictive. But, depending on the language we end up with, we could block all bots from companies involved in AI, such as Google, and user website could end up not being indexed by the search engines of such companies. I think we want to steer clear of blocking search engine indexing bots, otherwise that will inhibit adoption of our robots.txt _et al._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify policy of the kinds of agents we block #127

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify policy of the kinds of agents we block #127

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions