Unable to see reports that reference statuses
Environment
- Installation type (OTP or From Source): Source (develop branch, 481f50bc)
- Elixir version (
elixir -v
for from source installations, N/A for OTP): 1.14.1 - Operating system: Fedora Linux 36
- PostgreSQL version (
psql -V
): 10.23
Bug description
The f.l4.pm
single-user instance has been operational since approximately 2019, with a data directory that is clocking at 11.9GB. I recently migrated it to a new server, and thought it would be good to validate that incoming reports work, turns out only some of them did, only ones that did not reference a status. As soon as an incoming report comes with a status, it is not added to the activities
table, with the following error
[error]Postgrex.Protocol (#PID<0.1005.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.1613.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.1613.0> was at location:
:prim_inet.recv0/3
(postgrex 0.15.9) lib/postgrex/protocol.ex:2972: Postgrex.Protocol.msg_recv/4
(postgrex 0.15.9) lib/postgrex/protocol.ex:2014: Postgrex.Protocol.recv_bind/3
(postgrex 0.15.9) lib/postgrex/protocol.ex:1869: Postgrex.Protocol.bind_execute_close/4
(db_connection 2.4.0) lib/db_connection/holder.ex:325: DBConnection.Holder.holder_apply/4
(db_connection 2.4.0) lib/db_connection.ex:1314: DBConnection.run_execute/5
(db_connection 2.4.0) lib/db_connection.ex:574: DBConnection.parsed_prepare_execute/5
(db_connection 2.4.0) lib/db_connection.ex:566: DBConnection.prepare_execute/4
[debug]QUERY ERROR source="activities" db=15001.6ms
SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = associated_object_id(a0."data") WHERE (associated_object_id((a0."data")) = $1) ORDER BY a0."id" LIMIT 1 ["https://f.l4.pm/objects/1d2d9a40-7953-4c24-86c4-f29dbc9deec4"]
[debug]QUERY ERROR db=0.0ms
I drilled down the code path to Pleroma.Web.ActivityPub.Utils.build_flag_object/1
, more specifically the Activity.get_by_object_ap_id_with_object(id)
call is locking up.
case Activity.get_by_ap_id_with_object(id) do
%Activity{} = activity ->
build_flag_object(activity)
nil ->
if activity = Activity.get_by_object_ap_id_with_object(id) do
build_flag_object(activity)
else
%{"id" => id, "deleted" => true}
end
end
def get_by_object_ap_id_with_object(ap_id) when is_binary(ap_id) do
ap_id
|> Queries.by_object_id()
|> with_preloaded_object()
|> first()
|> Repo.one()
end
Running EXPLAIN
on the query reveals that PostgreSQL should be using the relevant indexes for its execution, and if I run that query manually, it finishes "instantly" (probably around 100ms), but there's no data provided
pleroma_dev=# explain SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 LEFT OUTER JOIN "objects" AS o1 ON (o1."data"->>'id') = associated_object_id(a0."data") WHERE ((a0."data")->>'id' = 'https://f.l4.pm/objects/1d2d9a40-7953-4c24-86c4-f29dbc9deec4');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1.36..5.80 rows=1 width=2141)
-> Index Scan using activities_unique_apid_index on activities a0 (cost=0.56..2.77 rows=1 width=946)
Index Cond: ((data ->> 'id'::text) = 'https://f.l4.pm/objects/1d2d9a40-7953-4c24-86c4-f29dbc9deec4'::text)
-> Index Scan using objects_unique_apid_index on objects o1 (cost=0.80..3.02 rows=1 width=1195)
Index Cond: ((data ->> 'id'::text) = (associated_object_id(a0.data))::text)
(5 rows)
pleroma_dev=# SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 LEFT OUTER JOIN "objects" AS o1 ON (o1."data"->>'id') = associated_object_id(a0."data") WHERE ((a0."data")->>'id' = 'https://f.l4.pm/objects/1d2d9a40-7953-4c24-86c4-f29dbc9deec4')
[...]
(0 rows)
The migration between servers involved a pg_dump
from the old server, a psql -f dump.sql
in the new one, and a mix ecto.migrate
to update some 200 commits that I was missing from my instance's branch. Trying to reach the URL itself redirects to the /notice/
URL, which then displays the original post that was being reported upon. My instance has required API authentication, so the post itself can't be shown to unauthenticated users, apologies.
Could this be caused by staying on PostgreSQL 10? Or could there be a deeper issue in how the index is constructed?
The database logs show that Ecto itself stopped execution of the query, and my server's load is not high.
==> /var/lib/pgsql/10/data/log/postgresql-Sat.log <==
2022-11-12 04:41:30.447 UTC [76494] ERROR: canceling statement due to user request
2022-11-12 04:41:30.447 UTC [76494] CONTEXT: PL/pgSQL function associated_object_id(jsonb) line 8 at assignment
2022-11-12 04:41:30.447 UTC [76494] STATEMENT: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = associated_object_id(a0."data") WHERE (associated_object_id((a0."data")) = $1) ORDER BY a0."id" LIMIT 1
2022-11-12 04:41:30.448 UTC [76494] LOG: could not send data to client: Broken pipe
2022-11-12 04:41:30.448 UTC [76494] FATAL: connection to client lost