HTTP 500 error on viewing home timeline, related to PostgreSQL timeouts
Environment
- Installation type (OTP or From Source): OTP, migrated from Source several years ago.
- Pleroma version (could be found in the "Version" tab of settings in Pleroma-FE): 2.4.5
- Elixir version (
elixir -v
for from source installations, N/A for OTP): N/A - Operating system: Ubuntu 20.04.5 LTS
- PostgreSQL version (
psql -V
): 12.13 - Hardware: DigitalOcean 14$ general-purpose droplet (was 6$ originally)
- Instance type: small, under 10 local users, infrequent posting
- The database size, as reported by PostgreSQL, is about 1177 MB
Bug description
Opening the home timeline in Preroma-FE as a logged in admin user causes a time out with a 500 error. While the timeline is loading, the CPU use jumps to 100%, with the most active process being PostgreSQL. If the timeline is left alone, it eventually loads some "new" statuses, which then can be displayed and read. The user account has several content filters active, but does not have a permanent effect. The log for the failed transaction contains the following message:
Jan 23 06:52:11 zombienet.org pleroma[79998]: [error] #PID<0.12817.4> running Pleroma.Web.Endpoint (connection #PID<0.12816.4>, stream id 1) terminated
Server: zombienet.org:80 (http)
Request: GET /api/v1/timelines/home?max_id=ARukhBgzdDCrDwdBS4&with_muted=false&limit=20
** (exit) an exception was raised:
** (DBConnection.ConnectionError) tcp recv: closed (the connection was closed by the pool, possibly due to a timeout or because the pool has been terminated)
(ecto_sql 3.6.2) lib/ecto/adapters/sql.ex:760: Ecto.Adapters.SQL.raise_sql_call_error/1
(ecto_sql 3.6.2) lib/ecto/adapters/sql.ex:693: Ecto.Adapters.SQL.execute/5
(ecto3.6.2) lib/ecto/repo/queryable.ex:224: Ecto.Repo.Queryable.execute/4
(ecto3.6.2) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
(pleroma 2.4.5-2-gd8e32646) lib/pleroma/pagination.ex:40: Pleroma.Pagination.fetch_paginated/4(pleroma 2.4.5-2-gd8e32646) lib/pleroma/web/activity_pub/activity_pub.ex:484: Pleroma.Web.ActivityPub.ActivityPub.fetch_activities/3
(pleroma 2.4.5-2-gd8e32646) lib/pleroma/web/mastodon_api/controllers/timeline_controller.ex:56: Pleroma.Web.MastodonAPI.TimelineController.home/2
(pleroma 2.4.5-2-gd8e32646) lib/pleroma/web/mastodon_api/controllers/timeline_controller.ex:5: Pleroma.Web.MastodonAPI.TimelineController.action/2
Trying to load the timeline in toot and Fedilab also fails, so this is not limited to Pleroma-FE.
Trying to load any other timeline, including the local and federated timelines for both logged in users and guests produces erratic results.
The instance used to work fine for several years, before becoming unresponsive.
Things i have attempted so far to solve the issue:
- Upgrading to a newer Pleroma release
- Applying tweaks from pgtune
- VACUUM FULL
- Increasing the server's resources (from 6$ to 14$ worth)
- ./bin/pleroma_ctl database prune_objects --vacuum
- Optimizing the BEAM as suggested by the manual
What kind of steps might i take to further diagnose the issue, and, eventually, return to a fully working, responsive instance?