Skip to content

Notifications: filter on users rather than activities

Matthieu Rakotojaona requested to merge rakoo/pleroma:develop into develop

The issue in #3218 is that the query looks at blocked domains in the entities to potentially return. Since there is no column or index for the domain but only the full ap_id, pg has to sequentially scan the full table, extract the domain, and check.

One solution could have been to create that index with the extracted domain, but that's one more index to manage.

The solution in this MR is to do the scan not on the activities table (which will always be the biggest one) but on the users table, since it's expected to have far fewer entries. Here's the EXPLAIN ANALYZE:

EXPLAIN ANALYSE SELECT
 n0."id",
 n0."seen",
 n0."type",
 n0."user_id",
 n0."activity_id",
 n0."inserted_at",
 n0."updated_at",
 a1."id",
 a1."data",
 a1."local",
 a1."actor",
 a1."recipients",
 a1."inserted_at",
 a1."updated_at",
 o2."id",
 o2."data",
 o2."inserted_at",
 o2."updated_at"
FROM "notifications" AS n0
INNER JOIN "activities" AS a1 ON a1."id" = n0."activity_id"
LEFT OUTER JOIN "objects" AS o2 ON (o2."data"->>'id') = associated_object_id(a1."data")
INNER JOIN "users" AS u3 ON u3."ap_id" = a1."actor"
LEFT OUTER JOIN "thread_mutes" AS t4 ON (t4."user_id" = '0000017d-2fdc-ab78-4e1d-d01859de0000') AND (t4."context" = a1."data"->>'context')
WHERE (n0."user_id" = '0000017d-2fdc-ab78-4e1d-d01859de0000')
AND (u3."is_active")
AND (NOT (u3."ap_id" = ANY(ARRAY[]::text[])))
AND (t4."user_id" IS NULL)
AND (NOT (u3."ap_id" = ANY(ARRAY[/* ELIDED FOR PRIVACY */])))
AND (NOT (substring(u3."ap_id" from '.*://([^/]*)') = ANY(ARRAY[]::text[])) 
	OR u3."ap_id" = ANY(
		SELECT ap_id FROM users AS u 
		INNER JOIN following_relationships AS fr
    	ON u.id = fr.following_id 
    	WHERE fr.follower_id = '0000017d-2fdc-ab78-4e1d-d01859de0000' 
    	AND fr.state = 2)
    )
ORDER BY n0."id" desc nulls last
LIMIT 20
;
                                                                                                                                                                                                                                                                                  QUERY PLAN                                                                                                                                                                                                                                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=89147.53..89147.58 rows=20 width=2039) (actual time=285.930..285.938 rows=20 loops=1)
   ->  Sort  (cost=89147.53..89149.50 rows=788 width=2039) (actual time=285.929..285.934 rows=20 loops=1)
         Sort Key: n0.id DESC NULLS LAST
         Sort Method: top-N heapsort  Memory: 162kB
         ->  Nested Loop Anti Join  (cost=63297.66..89126.56 rows=788 width=2039) (actual time=232.389..283.957 rows=3323 loops=1)
               Join Filter: ((t4.context)::text = (a1.data ->> 'context'::text))
               ->  Nested Loop Left Join  (cost=63297.51..89104.60 rows=788 width=2039) (actual time=232.370..282.576 rows=3323 loops=1)
                     ->  Hash Join  (cost=63296.71..88076.90 rows=788 width=756) (actual time=231.866..253.443 rows=3323 loops=1)
                           Hash Cond: ((a1.actor)::text = (u3.ap_id)::text)
                           ->  Nested Loop  (cost=0.43..24771.88 rows=3330 width=756) (actual time=0.042..19.411 rows=3330 loops=1)
                                 ->  Seq Scan on notifications n0  (cost=0.00..86.03 rows=3330 width=61) (actual time=0.021..0.875 rows=3330 loops=1)
                                       Filter: (user_id = '0000017d-2fdc-ab78-4e1d-d01859de0000'::uuid)
                                       Rows Removed by Filter: 112
                                 ->  Index Scan using activities_pkey on activities a1  (cost=0.43..7.41 rows=1 width=695) (actual time=0.005..0.005 rows=1 loops=3330)
                                       Index Cond: (id = n0.activity_id)
                           ->  Hash  (cost=62356.72..62356.72 rows=75165 width=38) (actual time=231.378..231.379 rows=75641 loops=1)
                                 Buckets: 131072  Batches: 1  Memory Usage: 6155kB
                                 ->  Bitmap Heap Scan on users u3  (cost=3414.06..62356.72 rows=75165 width=38) (actual time=15.891..202.407 rows=75641 loops=1)
                                       Recheck Cond: is_active
                                       Filter: ((ap_id)::text <> ALL ('{/* ELIDED FOR PRIVACY */}'::text[]))
                                       Rows Removed by Filter: 2
                                       Heap Blocks: exact=41161
                                       ->  Bitmap Index Scan on users_is_active_index  (cost=0.00..824.17 rows=75167 width=0) (actual time=8.913..8.914 rows=75650 loops=1)
                                             Index Cond: (is_active = true)
                     ->  Index Scan using objects_unique_apid_index on objects o2  (cost=0.80..1.30 rows=1 width=1283) (actual time=0.005..0.005 rows=1 loops=3323)
                           Index Cond: ((data ->> 'id'::text) = (associated_object_id(a1.data))::text)
               ->  Materialize  (cost=0.14..8.17 rows=1 width=516) (actual time=0.000..0.000 rows=0 loops=3323)
                     ->  Index Only Scan using unique_index on thread_mutes t4  (cost=0.14..8.16 rows=1 width=516) (actual time=0.012..0.012 rows=0 loops=1)
                           Index Cond: (user_id = '0000017d-2fdc-ab78-4e1d-d01859de0000'::uuid)
                           Heap Fetches: 0
 Planning Time: 4.547 ms
 Execution Time: 286.993 ms
(32 lignes)

On my dev machine this query goes from ~5s to ~300ms

This is a partial fix for #3218 : it works under the assumption that num(activities)/num(users). This assumption is not verified when there's a lot of spam, because then new accounts are created and the ratio goes up. The proper fix is to delete notifications of blocked/muted users and domains and fetch all the rest, but that's destructive so I didn't want to do that for my very first MR

Checklist

  • Adding a changelog: In the changelog.d directory, create a file named <code>.<type>.

Merge request reports