Pleroma(PostgreSQL) seems not to use index to accelerate search for other languages
Today I successfully install pg_jieba
extension to enable full text search for CJK.
But then I notice that the search in web page is really slow, while in psql
console is fast.
After looking up in source code, it seems that when I search in web page, it will execute something like:
pleroma=# explain analyze select * from objects where to_tsvector(data->>'content') @@ websearch_to_tsquery('中文搜索') order by inserted_at desc limit 20;
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=34142.56..34142.57 rows=1 width=509) (actual time=12623.348..12623.404 rows=13 loops=1)
-> Sort (cost=34142.56..34142.57 rows=1 width=509) (actual time=12623.346..12623.401 rows=13 loops=1)
Sort Key: inserted_at DESC
Sort Method: quicksort Memory: 49kB
-> Gather (cost=1000.00..34142.55 rows=1 width=509) (actual time=3695.397..12623.374 rows=13 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on objects (cost=0.00..33142.45 rows=1 width=509) (actual time=5618.056..12589.391 rows=4 loops=3)
Filter: (to_tsvector((data ->> 'content'::text)) @@ websearch_to_tsquery('
中文搜索'::text))
Rows Removed by Filter: 38638
Planning Time: 0.191 ms
Execution Time: 12623.427 ms
(12 rows)
or:
pleroma=# explain analyze select * from objects where to_tsvector(data->>'content') @@ plainto_tsquery('中文搜索') order by inserted_at desc limit 20;
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=34142.56..34142.57 rows=1 width=509) (actual time=13951.493..13955.684 rows=13 loops=1)
-> Sort (cost=34142.56..34142.57 rows=1 width=509) (actual time=13951.491..13955.681 rows=13 loops=1)
Sort Key: inserted_at DESC
Sort Method: quicksort Memory: 49kB
-> Gather (cost=1000.00..34142.55 rows=1 width=509) (actual time=4150.413..13955.658 rows=13 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on objects (cost=0.00..33142.45 rows=1 width=509) (actual time=7498.200..13930.630 rows=4 loops=3)
Filter: (to_tsvector((data ->> 'content'::text)) @@ plainto_tsquery('中文搜索'::text))
Rows Removed by Filter: 38638
Planning Time: 0.148 ms
Execution Time: 13955.702 ms
(12 rows)
But when I specify configuration, magic appears:
pleroma=# explain analyze select * from objects where to_tsvector('public.jiebaqry', data->>'content') @@ plainto_tsquery('public.jiebaqry', '中文搜索') order by inserted_at desc limit 20;
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=60.28..60.28 rows=1 width=509) (actual time=0.074..0.076 rows=13 loops=1)
-> Sort (cost=60.28..60.28 rows=1 width=509) (actual time=0.073..0.074 rows=13 loops=1)
Sort Key: inserted_at DESC
Sort Method: quicksort Memory: 49kB
-> Bitmap Heap Scan on objects (cost=56.00..60.27 rows=1 width=509) (actual time=0.054..0.064 rows=13 loops=1)
Recheck Cond: (to_tsvector('jiebaqry'::regconfig, (data ->> 'content'::text)) @@ '''中文'' & ''搜索'' & ''中文搜索'''::tsquery)
Heap Blocks: exact=12
-> Bitmap Index Scan on objects_fts (cost=0.00..56.00 rows=1 width=0) (actual time=0.050..0.050 rows=14 loops=1)
Index Cond: (to_tsvector('jiebaqry'::regconfig, (data ->> 'content'::text)) @@ '''中文'' & ''搜索'' & ''中文搜索'''::tsquery)
Planning Time: 0.140 ms
Execution Time: 0.096 ms
(11 rows)
pleroma=# explain analyze select * from objects where to_tsvector('public.jiebaqry', data->>'content') @@ websearch_to_tsquery('public.jiebaqry', '中文搜索') order by inserted_at desc limit 20;
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=60.28..60.28 rows=1 width=509) (actual time=0.093..0.095 rows=13 loops=1)
-> Sort (cost=60.28..60.28 rows=1 width=509) (actual time=0.092..0.094 rows=13 loops=1)
Sort Key: inserted_at DESC
Sort Method: quicksort Memory: 49kB
-> Bitmap Heap Scan on objects (cost=56.00..60.27 rows=1 width=509) (actual time=0.068..0.081 rows=13 loops=1)
Recheck Cond: (to_tsvector('jiebaqry'::regconfig, (data ->> 'content'::text)) @@ '''中文'' & ''搜索'' & ''中文搜索'''::tsquery)
Heap Blocks: exact=12
-> Bitmap Index Scan on objects_fts (cost=0.00..56.00 rows=1 width=0) (actual time=0.062..0.062 rows=14 loops=1)
Index Cond: (to_tsvector('jiebaqry'::regconfig, (data ->> 'content'::text)) @@ '''中文'' & ''搜索'' & ''中文搜索'''::tsquery)
Planning Time: 0.162 ms
Execution Time: 0.116 ms
(11 rows)
That's weird. The server knows the exact text search configuration, but it doesn't use the index of it. Although it is easy to fix it, I still have no idea why this thing happens.
PostgreSQL: 13+226.pgdg16.04+1 Pleroma: 2.3.0-1-gb221d77a