Combatting gitlab spam vs being approachable #54

Open
opened 2021-02-14 17:24:40 +00:00 by ilja · 12 comments
Member

Spam on the gitlab has been a problem for a while. Different things have been tried in the past to combat it, but combatting spam may have the effect of making pleroma less easy to approach for new people who want to contribute.

We've already tried a honeypot, which didn't seem to have any effect[1]. I remember reCaptcha was turned on at some point. Now we ask people to ask an account on a dedicated chatroom. There's a link and a web interface, so I think this is about as easy as we can make it, but sometimes people don't get a response[2].

I don't really have an answer for this problem myself, but at least wanted to make sure this is known.

[1] https://gitlab.com/gitlab-org/gitlab-foss/-/issues/46548#note_244919214

[2]
image

Spam on the gitlab has been a problem for a while. Different things have been tried in the past to combat it, but combatting spam may have the effect of making pleroma less easy to approach for new people who want to contribute. We've already tried a honeypot, which didn't seem to have any effect[1]. I remember reCaptcha was turned on at some point. Now we ask people to ask an account on a dedicated chatroom. There's a link and a web interface, so I think this is about as easy as we can make it, but sometimes people don't get a response[2]. I don't really have an answer for this problem myself, but at least wanted to make sure this is known. [1] <https://gitlab.com/gitlab-org/gitlab-foss/-/issues/46548#note_244919214> [2] ![image](/attachments/d4b37547-928c-4d0f-bf2c-88519e83087f)
Member

Honestly I wonder if we should even try to combat spam. It is annoying for sure, but what can the spammers really do? Create a bunch of empty projects and get some free SEO? They can't really harm the instance much and don't seem to use a lot of storage. Creating accounts manually on requests from irc however is quite annoying, both for people submitting requests and people executing them.

I have an approach that I think strikes a middle-ground between approachability and spam combating. Create fediverse bot to automate registration for those who already are on fedi (which is the majority of bug reporters/contributiors), they just PM an email and a preferred username and we create an account using gitlab admin api. gitlab spammers won't bother with registering on fedi considering we are not the only ones hosting an open gitlab instance. But until code is written for that approach, I wonder if we should keep the irc registration thing.

Honestly I wonder if we should even try to combat spam. It is annoying for sure, but what can the spammers really do? Create a bunch of empty projects and get some free SEO? They can't really harm the instance much and don't seem to use a lot of storage. Creating accounts manually on requests from irc however is quite annoying, both for people submitting requests and people executing them. I have an approach that I think strikes a middle-ground between approachability and spam combating. Create fediverse bot to automate registration for those who already are on fedi (which is the majority of bug reporters/contributiors), they just PM an email and a preferred username and we create an account using gitlab admin api. gitlab spammers won't bother with registering on fedi considering we are not the only ones hosting an open gitlab instance. But until code is written for that approach, I wonder if we should keep the irc registration thing.
Owner

feiverse bot idea is good but needs refinement IMO. We have quite a lot of script kiddies on the network that would be more than happy to exploit system just for shits and giggles.

feiverse bot idea is good but needs refinement IMO. We have quite a lot of script kiddies on the network that would be more than happy to exploit system just for shits and giggles.
Member

I don't think this is a concern. You still need an email for registering the account. It is certainly better than open registrations.

I don't think this is a concern. You still need an email for registering the account. It is certainly better than open registrations.
Owner

it just sounds like it will quickly become an equivalent of open registrations

it just sounds like it will quickly become an equivalent of open registrations
Member

It will not for a while, our spam comes from spammers who hunt for open gitlab instances, not from fediverse script kiddies, these people would not bother to register on the fediverse considering how many other open-registration gitlab instances are there.

It will not for a while, our spam comes from spammers who hunt for open gitlab instances, not from fediverse script kiddies, these people would not bother to register on the fediverse considering how many other open-registration gitlab instances are there.
Owner

idk, it might as well invite (more) spammers to fediverse. I mean they already have emails to spam, why would they ever consider joining literally-whos gitlab instances, amirite?

either way this solution is better than nothing, I'm just thinking about how it should better integrated to prevent abuse, i.e. they still have to create an account using gitlab, and request approval using fedi bot instead of using fedi bot to create an account for them.

idk, it might as well invite (more) spammers to fediverse. I mean they already have emails to spam, why would they ever consider joining literally-whos gitlab instances, amirite? either way this solution is better than nothing, I'm just thinking about how it should better integrated to prevent abuse, i.e. they still have to create an account using gitlab, and request approval using fedi bot instead of using fedi bot to create an account for them.
Member

why would they ever consider joining literally-whos gitlab instances, amirite?

It is not like they spam things for us to see, they are doing it for SEO.

I'm just thinking about how it should better integrated to prevent abuse, i.e. they still have to create an account using gitlab, and request approval using fedi bot instead of using fedi bot to create an account for them.

Might be a better idea, but this would require modifying gitlab, which is something I would rather never do. I guess we will go with a simple implementation first and then implement more spam-resistant solutions as problems appear

> why would they ever consider joining literally-whos gitlab instances, amirite? It is not like they spam things for us to see, they are doing it for SEO. > I'm just thinking about how it should better integrated to prevent abuse, i.e. they still have to create an account using gitlab, and request approval using fedi bot instead of using fedi bot to create an account for them. Might be a better idea, but this would require modifying gitlab, which is something I would rather never do. I guess we will go with a simple implementation first and then implement more spam-resistant solutions as problems appear
Owner

If we ever choose to pay for any of the non-free stuff for GitLab all these spammer accounts get included into the licensing. That's one thing that concerns me. It's also additional load, they could be using us for free file hosting of malware payloads, etc etc.

If we ever choose to pay for any of the non-free stuff for GitLab all these spammer accounts get included into the licensing. That's one thing that concerns me. It's also additional load, they could be using us for free file hosting of malware payloads, etc etc.
Author
Member

I wanted to send an MR to some Gnome project and they have a similar issue on their gitlab. They allow people to ask for an account, pretty similar to how we do it now (only it was via mail instead of chat), and I must say, it felt pretty demotivating... I really don't want people who want to contribute to pleroma (whether it's code, docs or issues) to feel the same here...

What they also allowed however (which we don't have) is to sign up via another account (in their case gitlab, github or google).

Maybe we could also have something like that? I think at least gitlab should be acceptable. There may be objections to things like Github and Google, but if it doesn't add trackers or anything on the login-page (which may or may not be the case, idk), I don't really see a problem because people can still decide to use gitlab or ask for an account via the chat (or other alternative that may or may not come up in the future).

Loginpage for the Gnome Gitlab: https://gitlab.gnome.org/users/sign_in

I wanted to send an MR to some Gnome project and they have a similar issue on their gitlab. They allow people to ask for an account, pretty similar to how we do it now (only it was via mail instead of chat), and I must say, it felt pretty demotivating... I really don't want people who want to contribute to pleroma (whether it's code, docs or issues) to feel the same here... What they also allowed however (which we don't have) is to sign up via another account (in their case gitlab, github or google). Maybe we could also have something like that? I think at least gitlab should be acceptable. There may be objections to things like Github and Google, but if it doesn't add trackers or anything on the login-page (which may or may not be the case, idk), I don't really see a problem because people can still decide to use gitlab or ask for an account via the chat (or other alternative that may or may not come up in the future). Loginpage for the Gnome Gitlab: https://gitlab.gnome.org/users/sign_in
Author
Member

Regs are currently opened again. In case we close them again, I want to document some things from chat:

Regs are currently opened again. In case we close them again, I want to document some things from chat: * IRC isn't ideal because it's not persistent and sometimes things go unnoticed * Sign up approvals are a thing * https://docs.gitlab.com/ee/user/admin_area/settings/sign_up_restrictions.html#require-administrator-approval-for-new-sign-ups * https://docs.gitlab.com/ee/user/admin_area/settings/sign_up_restrictions.html * Possibly unworkable if too much spam? * Email or fedi account could be an alternative to IRC * A fedi bot was already an idea, so an account could be a first step before automating it * There's an api to create accounts, so someone could make a login page with a simple captcha: https://docs.gitlab.com/ee/api/users.html#user-creation
Author
Member

Another option I feel should be mentioned is to move away from a self hosted instance. This means finding a good alternative that everyone is happy with. Codeberg could be that[1], maybe there are others.

Advantage of not self-hosting:

  • No more (or at least significantly less) spam managing
  • No more managing updates
  • A whole class of infra problems is not for us any more
  • ...

Needed properties:

  • Good UX
  • CI should still be possible
  • We should be fairly confident that the instance doesn't suddenly goes down
  • FLOSS
  • Managed by a community/organisation we can get behind
  • Others?

This also means migrating over, which is probably a bunch of work. So we'll need to make sure the eventual benefits outweigh the effort.

[1] https://codeberg.org/

What I can say about Codeberg:

  • It uses Gitea, which uses the same MR/PR flow as gitlab, UX is similar I think
  • Many other projects are on there as well, so there's a good chance people already have an account (or if they make one, they can use it later for other projects too). They are also openly very pro the idea of adding federation once Gitea allows it.
  • CI is possible with Woodpecker, but don't ask me details because I never worked with that and i think it's still in some alpha stage rn
  • It's already becoming a big name, I think we can be confident it won't suddenly shut down
  • Gitea is FLOSS (and not open-core like Gitlab). There are some custom changes on Codeberg, but changes are generally pushed upstream.
  • Managed by a community (not a for-profit)
  • 100% aimed at floss (non floss code is not allowed)
  • They are on fedi: https://mastodon.technology/@codeberg
  • ...
Another option I feel should be mentioned is to move away from a self hosted instance. This means finding a good alternative that everyone is happy with. Codeberg could be that[1], maybe there are others. Advantage of not self-hosting: * No more (or at least significantly less) spam managing * No more managing updates * A whole class of infra problems is not for us any more * ... Needed properties: * Good UX * CI should still be possible * We should be fairly confident that the instance doesn't suddenly goes down * FLOSS * Managed by a community/organisation we can get behind * Others? This also means migrating over, which is probably a bunch of work. So we'll need to make sure the eventual benefits outweigh the effort. [1] https://codeberg.org/ What I can say about Codeberg: * It uses Gitea, which uses the same MR/PR flow as gitlab, UX is similar I think * Many other projects are on there as well, so there's a good chance people already have an account (or if they make one, they can use it later for other projects too). They are also openly very pro the idea of adding federation once Gitea allows it. * CI is possible with Woodpecker, but don't ask me details because I never worked with that and i think it's still in some alpha stage rn * It's already becoming a big name, I think we can be confident it won't suddenly shut down * Gitea is FLOSS (and not open-core like Gitlab). There are some custom changes on Codeberg, but changes are generally pushed upstream. * Managed by a community (not a for-profit) * 100% aimed at floss (non floss code is not allowed) * They are on fedi: <https://mastodon.technology/@codeberg> * ...
Author
Member

A first stab at a custom register form: https://codeberg.org/ilja/simple_register_form

A first stab at a custom register form: <https://codeberg.org/ilja/simple_register_form>
Sign in to join this conversation.
No labels
BE
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
pleroma/pleroma-meta#54
No description provided.