Block GPTBot by default
This might be too opinionated of me to add, but I honestly think it might be important for preventing users' data from unknowingly being used for data models without their consent.
Some notes/questions
- I don't know if
ChatGPT-User
should be added as well? - Should we assume the answer to the block GPTBot question if they configure the search to not be indexable?
- I wonder if we should make
robots.txt
more dynamically configurable from the admin dashboard in the future. Especially given the number of AI user agents to block can and will likely grow in the future. In which case, I'd wonder if we should have the list of AI user agents to block should be pulled and updated from somewhere else.
Checklist
-
Adding a changelog: In the changelog.d
directory, create a file named<code>.<type>
.<code>
can be anything, but we recommend using a more or less unique identifier to avoid collisions, such as the branch name.<type>
can beadd
,remove
,fix
,security
orskip
.skip
is only used if there is no user-visible change in the MR (for example, only editing comments in the code). Otherwise, choose a type that corresponds to your change.In the file, write the changelog entry. For example, if an MR adds group functionality, we can create a file named
group.add
and writeAdd group functionality
in it.If one changelog entry is not enough, you may add more. But that might mean you can split it into two MRs. Only use more than one changelog entry if you really need to (for example, when one change in the code fix two different bugs, or when refactoring).