Description
I'd like the README to be more explicit about the kinds of bots that we block.
This issue was raised while reviewing #126. In this particular case, I couldn't find a bot listed for training Mistral AI's LLM, so it's not the case that we only block bots associated with AI companies which have AI training bots that we also block.
The current statement:
This is an open list of web crawlers associated with AI companies and the training of LLMs to block.
(emphasis added)
seems a little too restrictive. But, depending on the language we end up with, we could block all bots from companies involved in AI, such as Google, and user website could end up not being indexed by the search engines of such companies. I think we want to steer clear of blocking search engine indexing bots, otherwise that will inhibit adoption of our robots.txt et al.