Scaling Pleroma for a worldwide userbase

Normally you would expect users to join a community on the Fediverse that is both relevant to the users' interests and is also located nearby geographically for performance reasons. e.g., someone in Japan should join a Japanese server and use federation to communicate with users across the world. This eliminates performance bottlenecks by keeping the latency low.

However there are situations where this is not possible. I administer a customized Pleroma server that automatically enrolls an existing community/userbase with thousands of members who are worldwide. We observe performance issues as a result. e.g., users in Australia struggle to upload media to our server in Chicago.

We have a few options at our disposal to improve general performance such as using config :Pleroma.Upload, base_url to point to a CDN. This only solves the performance for consumption of content, not creation.

To achieve good performance we need to reduce latency of TCP connections by simply moving those activities to edge nodes which are geographically dispersed around the world. The problem is that this currently is not feasible in Pleroma. Simply moving Pleroma to be entirely behind a CDN is not wise because of websockets. CDNs do not like websockets.

CloudFlare supports websockets, but a limited number unless you pay for premium plans. Akamai websocket support is unclear. It appears not permitted for production traffic based on my research.

This is unsurprising because websockets would be a serious strain on a CDN anyway. We can build our own CDN (and we are), but that doesn't change the reality that it's not a great solution to the problem.

This leaves us with the options of a major engineering effort and a lame hack:

OTP clustering, which will require some significant dev investment. But this is something we do want long term for building HA and load balanced Pleroma clusters
Some kind of API extension to support this, and bake it into the apps we influence (Roma, etc). e.g., hint that /api/v1/statuses should be accessed at a different base_url which can be proxied by a CDN, and fallback if that's unavailable.

Are there any other things we should consider? How else can this problem be approached?

Edited Apr 08, 2019 by feld