Skip to content

Search: Use RUM index.

lain requested to merge rum-index into develop

This is somewhat experimental, as it relies on a postgresql extension that's not usually packaged. The performance wins are substantial though, and it would enable efficient timeframe-based searches.

Running the benchmark for 'cofe' on a recent soykaf database.

GIN index, after VACUUM ANALYZE. table size: 48 GB, index size: 1793 MB.

Name             ips        average  deviation         median         99th %
search          1.44      696.38 ms     ±4.84%      703.09 ms      730.72 ms
2019-05-10T16:57:06.32907 daemon.info: May 10 18:57:06 postgres: LOG:  duration: 717.003 ms  plan:
2019-05-10T16:57:06.32913 daemon.info: May 10 18:57:06 postgres: 	Query Text: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (a0."data"->>'type' = 'Create') AND ('https://www.w3.org/ns/activitystreams#Public' = ANY(a0."recipients")) AND (to_tsvector('english', o1."data"->>'content') @@ plainto_tsquery('english', $1)) ORDER BY a0."id" DESC LIMIT 20
2019-05-10T16:57:06.32915 daemon.info: May 10 18:57:06 postgres: 	Limit  (cost=77517.21..77517.26 rows=20 width=1434)
2019-05-10T16:57:06.32917 daemon.info: May 10 18:57:06 postgres: 	  ->  Sort  (cost=77517.21..77530.87 rows=5463 width=1434)
2019-05-10T16:57:06.32920 daemon.info: May 10 18:57:06 postgres: 	        Sort Key: a0.id DESC
2019-05-10T16:57:06.32921 daemon.info: May 10 18:57:06 postgres: 	        ->  Nested Loop  (cost=90.99..77371.85 rows=5463 width=1434)
2019-05-10T16:57:06.32924 daemon.info: May 10 18:57:06 postgres: 	              ->  Bitmap Heap Scan on objects o1  (cost=90.42..14027.99 rows=10248 width=690)
2019-05-10T16:57:06.32926 daemon.info: May 10 18:57:06 postgres: 	                    Recheck Cond: (to_tsvector('english'::regconfig, (data ->> 'content'::text)) @@ '''cofe'''::tsquery)
2019-05-10T16:57:06.32927 daemon.info: May 10 18:57:06 postgres: 	                    ->  Bitmap Index Scan on objects_fts  (cost=0.00..87.86 rows=10248 width=0)
2019-05-10T16:57:06.32929 daemon.info: May 10 18:57:06 postgres: 	                          Index Cond: (to_tsvector('english'::regconfig, (data ->> 'content'::text)) @@ '''cofe'''::tsquery)
2019-05-10T16:57:06.32931 daemon.info: May 10 18:57:06 postgres: 	              ->  Index Scan using activities_create_objects_index on activities a0  (cost=0.56..6.15 rows=3 width=744)
2019-05-10T16:57:06.32933 daemon.info: May 10 18:57:06 postgres: 	                    Index Cond: (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = (o1.data ->> 'id'::text))
2019-05-10T16:57:06.32938 daemon.info: May 10 18:57:06 postgres: 	                    Filter: (((data ->> 'type'::text) = 'Create'::text) AND ('https://www.w3.org/ns/activitystreams#Public'::text = ANY ((recipients)::text[])))

RUM index, after VACUUM ANALYZE. table size: 48 GB, index size: 6740 MB

Name             ips        average  deviation         median         99th %
search         39.77       25.14 ms    ±19.94%       23.76 ms       43.88 ms
2019-05-11T07:57:00.58987 daemon.info: May 11 09:57:00 postgres: LOG:  duration: 24.027 ms  plan:
2019-05-11T07:57:00.59001 daemon.info: May 11 09:57:00 postgres: 	Query Text: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (a0."data"->>'type' = 'Create') AND ('https://www.w3.org/ns/activitystreams#Public' = ANY(a0."recipients")) AND (o1."fts_content" @@ plainto_tsquery('english', $1)) ORDER BY o1."inserted_at" <=> now()::date LIMIT 20
2019-05-11T07:57:00.59007 daemon.info: May 11 09:57:00 postgres: 	Limit  (cost=9.62..298.63 rows=20 width=1451)
2019-05-11T07:57:00.59010 daemon.info: May 11 09:57:00 postgres: 	  ->  Nested Loop  (cost=9.62..1168522.63 rows=80863 width=1451)
2019-05-11T07:57:00.59013 daemon.info: May 11 09:57:00 postgres: 	        ->  Index Scan using objects_fts on objects o1  (cost=9.06..171034.96 rows=154500 width=699)
2019-05-11T07:57:00.59017 daemon.info: May 11 09:57:00 postgres: 	              Index Cond: (fts_content @@ plainto_tsquery('english'::regconfig, $1))
2019-05-11T07:57:00.59024 daemon.info: May 11 09:57:00 postgres: 	              Order By: (inserted_at <=> ((now())::date)::timestamp without time zone)
2019-05-11T07:57:00.59029 daemon.info: May 11 09:57:00 postgres: 	        ->  Index Scan using activities_create_objects_index on activities a0  (cost=0.56..6.42 rows=3 width=744)
2019-05-11T07:57:00.59034 daemon.info: May 11 09:57:00 postgres: 	              Index Cond: (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = (o1.data ->> 'id'::text))
2019-05-11T07:57:00.59039 daemon.info: May 11 09:57:00 postgres: 	              Filter: (((data ->> 'type'::text) = 'Create'::text) AND ('https://www.w3.org/ns/activitystreams#Public'::text = ANY ((recipients)::text[])))
Edited by lain

Merge request reports