Draft: Fixing GIN search performance
I recently checked search performance on pleroma.soykaf.com, which uses the default GIN index and contains around 35 million posts. The results were consistent with the index not being used at all.
I investigated a bit and found that when just getting the objects (in a query like select * from objects where fts_content @@ plainto_tsquery('english', 'tzag')
) it was reasonably fast (15~150ms) for any kind of query. Common words like 'nsfw' or 'cofe' would just do a backwards search, rarer or non-existent words like 'tzag' would use the gin index.
It seems that the postgres planner can not deal very well with the joins we're doing. I also tried ordering by the object id instead of the activity id, but this did not speed up things.
I could get good speeds by getting the object ids in a subquery (and limiting / offsetting / ordering there) and then joining over the activities.
Sadly, this isn't really an option we can pursue because we also do some other filters on the activities (actors, blocks, etc). So my current idea is to put the fts_content on the activity table instead of the object table. As always, having a unified table would be better, but I don't think we'll do that anytime soon. Still, this should give the planner a better idea of what we expect it to do.
I have not tested this yet on the soykaf db, but will do so when I find the time.
Things to do
- Clean up our content before vectorizing it: #1384
- Do the same thing for RUM.
- Make a mix task / background migrations that takes a few thousand activities at a time and adds the fts content. Doing it in a normal migration is prohibitively expensive (10+ hours on a big db). The trigger will take care of new activities.
- See if we can get away with this kind of trigger or if we also need the inverse, a trigger on objects that will update the corresponding activity.