Block GPTBot by default (!3942) · Merge requests · Pleroma / pleroma

Sean King requested to merge add/block-gptbot into develop Aug 22, 2023

This might be too opinionated of me to add, but I honestly think it might be important for preventing users' data from unknowingly being used for data models without their consent.

Some notes/questions

I don't know if ChatGPT-User should be added as well?
Should we assume the answer to the block GPTBot question if they configure the search to not be indexable?
I wonder if we should make robots.txt more dynamically configurable from the admin dashboard in the future. Especially given the number of AI user agents to block can and will likely grow in the future. In which case, I'd wonder if we should have the list of AI user agents to block should be pulled and updated from somewhere else.

Checklist

Adding a changelog: In the changelog.d directory, create a file named <code>.<type>.

<code> can be anything, but we recommend using a more or less unique identifier to avoid collisions, such as the branch name.

<type> can be add, remove, fix, security or skip. skip is only used if there is no user-visible change in the MR (for example, only editing comments in the code). Otherwise, choose a type that corresponds to your change.

In the file, write the changelog entry. For example, if an MR adds group functionality, we can create a file named group.add and write Add group functionality in it.

If one changelog entry is not enough, you may add more. But that might mean you can split it into two MRs. Only use more than one changelog entry if you really need to (for example, when one change in the code fix two different bugs, or when refactoring).

Edited Aug 22, 2023 by Sean King

Block GPTBot by default

Some notes/questions

Checklist

Merge request reports