Search: Use RUM index.
This is somewhat experimental, as it relies on a postgresql extension that's not usually packaged. The performance wins are substantial though, and it would enable efficient timeframe-based searches.
Running the benchmark for 'cofe' on a recent soykaf database.
GIN index, after VACUUM ANALYZE. table size: 48 GB, index size: 1793 MB.
Name ips average deviation median 99th %
search 1.44 696.38 ms ±4.84% 703.09 ms 730.72 ms
2019-05-10T16:57:06.32907 daemon.info: May 10 18:57:06 postgres: LOG: duration: 717.003 ms plan:
2019-05-10T16:57:06.32913 daemon.info: May 10 18:57:06 postgres: Query Text: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (a0."data"->>'type' = 'Create') AND ('https://www.w3.org/ns/activitystreams#Public' = ANY(a0."recipients")) AND (to_tsvector('english', o1."data"->>'content') @@ plainto_tsquery('english', $1)) ORDER BY a0."id" DESC LIMIT 20
2019-05-10T16:57:06.32915 daemon.info: May 10 18:57:06 postgres: Limit (cost=77517.21..77517.26 rows=20 width=1434)
2019-05-10T16:57:06.32917 daemon.info: May 10 18:57:06 postgres: -> Sort (cost=77517.21..77530.87 rows=5463 width=1434)
2019-05-10T16:57:06.32920 daemon.info: May 10 18:57:06 postgres: Sort Key: a0.id DESC
2019-05-10T16:57:06.32921 daemon.info: May 10 18:57:06 postgres: -> Nested Loop (cost=90.99..77371.85 rows=5463 width=1434)
2019-05-10T16:57:06.32924 daemon.info: May 10 18:57:06 postgres: -> Bitmap Heap Scan on objects o1 (cost=90.42..14027.99 rows=10248 width=690)
2019-05-10T16:57:06.32926 daemon.info: May 10 18:57:06 postgres: Recheck Cond: (to_tsvector('english'::regconfig, (data ->> 'content'::text)) @@ '''cofe'''::tsquery)
2019-05-10T16:57:06.32927 daemon.info: May 10 18:57:06 postgres: -> Bitmap Index Scan on objects_fts (cost=0.00..87.86 rows=10248 width=0)
2019-05-10T16:57:06.32929 daemon.info: May 10 18:57:06 postgres: Index Cond: (to_tsvector('english'::regconfig, (data ->> 'content'::text)) @@ '''cofe'''::tsquery)
2019-05-10T16:57:06.32931 daemon.info: May 10 18:57:06 postgres: -> Index Scan using activities_create_objects_index on activities a0 (cost=0.56..6.15 rows=3 width=744)
2019-05-10T16:57:06.32933 daemon.info: May 10 18:57:06 postgres: Index Cond: (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = (o1.data ->> 'id'::text))
2019-05-10T16:57:06.32938 daemon.info: May 10 18:57:06 postgres: Filter: (((data ->> 'type'::text) = 'Create'::text) AND ('https://www.w3.org/ns/activitystreams#Public'::text = ANY ((recipients)::text[])))
RUM index, after VACUUM ANALYZE. table size: 48 GB, index size: 6740 MB
Name ips average deviation median 99th %
search 39.77 25.14 ms ±19.94% 23.76 ms 43.88 ms
2019-05-11T07:57:00.58987 daemon.info: May 11 09:57:00 postgres: LOG: duration: 24.027 ms plan:
2019-05-11T07:57:00.59001 daemon.info: May 11 09:57:00 postgres: Query Text: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (a0."data"->>'type' = 'Create') AND ('https://www.w3.org/ns/activitystreams#Public' = ANY(a0."recipients")) AND (o1."fts_content" @@ plainto_tsquery('english', $1)) ORDER BY o1."inserted_at" <=> now()::date LIMIT 20
2019-05-11T07:57:00.59007 daemon.info: May 11 09:57:00 postgres: Limit (cost=9.62..298.63 rows=20 width=1451)
2019-05-11T07:57:00.59010 daemon.info: May 11 09:57:00 postgres: -> Nested Loop (cost=9.62..1168522.63 rows=80863 width=1451)
2019-05-11T07:57:00.59013 daemon.info: May 11 09:57:00 postgres: -> Index Scan using objects_fts on objects o1 (cost=9.06..171034.96 rows=154500 width=699)
2019-05-11T07:57:00.59017 daemon.info: May 11 09:57:00 postgres: Index Cond: (fts_content @@ plainto_tsquery('english'::regconfig, $1))
2019-05-11T07:57:00.59024 daemon.info: May 11 09:57:00 postgres: Order By: (inserted_at <=> ((now())::date)::timestamp without time zone)
2019-05-11T07:57:00.59029 daemon.info: May 11 09:57:00 postgres: -> Index Scan using activities_create_objects_index on activities a0 (cost=0.56..6.42 rows=3 width=744)
2019-05-11T07:57:00.59034 daemon.info: May 11 09:57:00 postgres: Index Cond: (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = (o1.data ->> 'id'::text))
2019-05-11T07:57:00.59039 daemon.info: May 11 09:57:00 postgres: Filter: (((data ->> 'type'::text) = 'Create'::text) AND ('https://www.w3.org/ns/activitystreams#Public'::text = ANY ((recipients)::text[])))
Merge request reports
Activity
I think we can do the following
config :pleroma, :database, rum_enabled: true
Then create two new tasks in
pleroma.database
. One of which will enable rum and drop gin, and the other one would drop rum and enable gin. I don't think it is a good idea to force existing installations to install rum considering it does not bring any new functionality, only performance improvements. Also iirc not many hosted postgres db providers have rum in their extension list.That's a very good question.
Today we don't really support optional migrations out of the box.
One option is to wrap the whole change function if a conditional check:
if Application.get_env(:pleroma, :database)[:rum_enabled] do ... end
The downside of this approach is that once you run the migration, if you enable run, you can't enable it again.
Another option is to create a task that loads migrations from other directories. For example, you could put those migrations in
priv/optional_migrations/rum_enabled/XYZ_do_something.exs
. Then, in order to run those migrations, you can do something like this:Ecto.Migrator.with_repo(repo, fn repo -> # The default migration directory Ecto.Migrator.run(repo, :up, all: true) if Application.get_env(:pleroma, :database)[:rum_enabled] do Ecto.Migrator.run(repo, "priv/optional_migrations/rum_enabled", :up, all: true) end end)
Then you put that inside "pleroma.migrate" and then you will always run all of the appropriate migrations depending on the user configuration. The benefit of this approach is that you have full control but you may also end up re-implementing the existing ecto.migrate/ecto.rollback commands. Note also we are using
Ecto.Migrator.with_repo/2
, which requires Ecto.SQL 3.1.2+ (released this weekend).Another idea is to send a PR to Ecto.SQL so we support a
--migration-path
option in Ecto. It should be relatively straight-forward to do so, and we will be happy to provide any guidance. With such flag, then people interested in using those special features can run those commands themselves, but it means they will be the ones doing most of the work.The other thing you will want to consider is testing. If you start having a bunch of optional migrations, you need to make sure your suite passes on all possible combinations. Luckily this can be done by using multiple environment variables on CI and handling those accordingly.
Thoughts?
I am also for doing
--migrations-path
, and maybe aliasingpleroma.migrate
/pleroma.rollback
todefp migrate(args) do Mix.Tasks.Ecto.Migrate.run(args) if Pleroma.Config.get([:database, :rum_enabled]), do: Mix.Tasks.Ecto.Migrate.run(["--migrations-path priv/optional_migrations/rum_enabled" | args]) end defp rollback(args) do Mix.Tasks.Ecto.Rollback.run(args) if Pleroma.Config.get([:database, :rum_enabled]), do: Mix.Tasks.Ecto.Rollback.run(["--migrations-path priv/optional_migrations/rum_enabled" | args]) end
for convenience
So I went ahead and implementations
--migrations-path
here: https://github.com/elixir-ecto/ecto_sql/commit/e839a9a327b632d73533ac8105ba360bc831cf83Feel free to give it a try and, if it works, we can do a new release.
Btw, you will have to split the option from the argument when calling run, like this:
Mix.Tasks.Ecto.Migrate.run(["--migrations-path", "priv/optional_migrations/rum_enabled" | args])
Other than that everything looks good!
added 1 commit
- f1e67bdc - Search: Add optional rum indexing / searching.
mentioned in merge request !1160 (merged)
added 157 commits
-
f1e67bdc...e5b34f5e - 156 commits from branch
develop
- 412a3d8a - Merge branch 'develop' of git.pleroma.social:pleroma/pleroma into rum-index
-
f1e67bdc...e5b34f5e - 156 commits from branch
- Automatically resolved by lain
- Automatically resolved by lain
added 1 commit
- ef63cf70 - CI: Use the correct image with the correct hostname.
mentioned in commit d2dacadb