Skip to content
Snippets Groups Projects

Search: Use RUM index.

Merged lain requested to merge rum-index into develop
All threads resolved!

This is somewhat experimental, as it relies on a postgresql extension that's not usually packaged. The performance wins are substantial though, and it would enable efficient timeframe-based searches.

Running the benchmark for 'cofe' on a recent soykaf database.

GIN index, after VACUUM ANALYZE. table size: 48 GB, index size: 1793 MB.

Name             ips        average  deviation         median         99th %
search          1.44      696.38 ms     ±4.84%      703.09 ms      730.72 ms
2019-05-10T16:57:06.32907 daemon.info: May 10 18:57:06 postgres: LOG:  duration: 717.003 ms  plan:
2019-05-10T16:57:06.32913 daemon.info: May 10 18:57:06 postgres: 	Query Text: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (a0."data"->>'type' = 'Create') AND ('https://www.w3.org/ns/activitystreams#Public' = ANY(a0."recipients")) AND (to_tsvector('english', o1."data"->>'content') @@ plainto_tsquery('english', $1)) ORDER BY a0."id" DESC LIMIT 20
2019-05-10T16:57:06.32915 daemon.info: May 10 18:57:06 postgres: 	Limit  (cost=77517.21..77517.26 rows=20 width=1434)
2019-05-10T16:57:06.32917 daemon.info: May 10 18:57:06 postgres: 	  ->  Sort  (cost=77517.21..77530.87 rows=5463 width=1434)
2019-05-10T16:57:06.32920 daemon.info: May 10 18:57:06 postgres: 	        Sort Key: a0.id DESC
2019-05-10T16:57:06.32921 daemon.info: May 10 18:57:06 postgres: 	        ->  Nested Loop  (cost=90.99..77371.85 rows=5463 width=1434)
2019-05-10T16:57:06.32924 daemon.info: May 10 18:57:06 postgres: 	              ->  Bitmap Heap Scan on objects o1  (cost=90.42..14027.99 rows=10248 width=690)
2019-05-10T16:57:06.32926 daemon.info: May 10 18:57:06 postgres: 	                    Recheck Cond: (to_tsvector('english'::regconfig, (data ->> 'content'::text)) @@ '''cofe'''::tsquery)
2019-05-10T16:57:06.32927 daemon.info: May 10 18:57:06 postgres: 	                    ->  Bitmap Index Scan on objects_fts  (cost=0.00..87.86 rows=10248 width=0)
2019-05-10T16:57:06.32929 daemon.info: May 10 18:57:06 postgres: 	                          Index Cond: (to_tsvector('english'::regconfig, (data ->> 'content'::text)) @@ '''cofe'''::tsquery)
2019-05-10T16:57:06.32931 daemon.info: May 10 18:57:06 postgres: 	              ->  Index Scan using activities_create_objects_index on activities a0  (cost=0.56..6.15 rows=3 width=744)
2019-05-10T16:57:06.32933 daemon.info: May 10 18:57:06 postgres: 	                    Index Cond: (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = (o1.data ->> 'id'::text))
2019-05-10T16:57:06.32938 daemon.info: May 10 18:57:06 postgres: 	                    Filter: (((data ->> 'type'::text) = 'Create'::text) AND ('https://www.w3.org/ns/activitystreams#Public'::text = ANY ((recipients)::text[])))

RUM index, after VACUUM ANALYZE. table size: 48 GB, index size: 6740 MB

Name             ips        average  deviation         median         99th %
search         39.77       25.14 ms    ±19.94%       23.76 ms       43.88 ms
2019-05-11T07:57:00.58987 daemon.info: May 11 09:57:00 postgres: LOG:  duration: 24.027 ms  plan:
2019-05-11T07:57:00.59001 daemon.info: May 11 09:57:00 postgres: 	Query Text: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') WHERE (a0."data"->>'type' = 'Create') AND ('https://www.w3.org/ns/activitystreams#Public' = ANY(a0."recipients")) AND (o1."fts_content" @@ plainto_tsquery('english', $1)) ORDER BY o1."inserted_at" <=> now()::date LIMIT 20
2019-05-11T07:57:00.59007 daemon.info: May 11 09:57:00 postgres: 	Limit  (cost=9.62..298.63 rows=20 width=1451)
2019-05-11T07:57:00.59010 daemon.info: May 11 09:57:00 postgres: 	  ->  Nested Loop  (cost=9.62..1168522.63 rows=80863 width=1451)
2019-05-11T07:57:00.59013 daemon.info: May 11 09:57:00 postgres: 	        ->  Index Scan using objects_fts on objects o1  (cost=9.06..171034.96 rows=154500 width=699)
2019-05-11T07:57:00.59017 daemon.info: May 11 09:57:00 postgres: 	              Index Cond: (fts_content @@ plainto_tsquery('english'::regconfig, $1))
2019-05-11T07:57:00.59024 daemon.info: May 11 09:57:00 postgres: 	              Order By: (inserted_at <=> ((now())::date)::timestamp without time zone)
2019-05-11T07:57:00.59029 daemon.info: May 11 09:57:00 postgres: 	        ->  Index Scan using activities_create_objects_index on activities a0  (cost=0.56..6.42 rows=3 width=744)
2019-05-11T07:57:00.59034 daemon.info: May 11 09:57:00 postgres: 	              Index Cond: (COALESCE(((data -> 'object'::text) ->> 'id'::text), (data ->> 'object'::text)) = (o1.data ->> 'id'::text))
2019-05-11T07:57:00.59039 daemon.info: May 11 09:57:00 postgres: 	              Filter: (((data ->> 'type'::text) = 'Create'::text) AND ('https://www.w3.org/ns/activitystreams#Public'::text = ANY ((recipients)::text[])))
Edited by lain

Merge request reports

Pipeline #11991 passed

Pipeline passed for ef63cf70 on rum-index

Test coverage 80.20% (0.00%) from 1 job
Approval is optional

Merged by rinpatchrinpatch 5 years ago (May 17, 2019 4:35pm UTC)

Merge details

  • Changes merged into develop with d2dacadb.
  • Deleted the source branch.

Pipeline #11995 passed

Pipeline passed for d2dacadb on develop

Test coverage 80.20% (0.00%) from 1 job

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I think we can do the following

    config :pleroma, :database,
      rum_enabled: true

    Then create two new tasks in pleroma.database. One of which will enable rum and drop gin, and the other one would drop rum and enable gin. I don't think it is a good idea to force existing installations to install rum considering it does not bring any new functionality, only performance improvements. Also iirc not many hosted postgres db providers have rum in their extension list.

  • Is there a warning if rum does not exists or it would silently fail?

  • Contributor

    as I discussed with lain on fedi yesterday, this will ultimately be an optional feature since rum indexes aren't mainline.

  • @plataformatec do you guys have experience with these kinds of 'optional' migrations?

  • That's a very good question.

    Today we don't really support optional migrations out of the box.

    One option is to wrap the whole change function if a conditional check:

    if Application.get_env(:pleroma, :database)[:rum_enabled] do
      ...
    end

    The downside of this approach is that once you run the migration, if you enable run, you can't enable it again.

    Another option is to create a task that loads migrations from other directories. For example, you could put those migrations in priv/optional_migrations/rum_enabled/XYZ_do_something.exs. Then, in order to run those migrations, you can do something like this:

    Ecto.Migrator.with_repo(repo, fn repo ->
      # The default migration directory
      Ecto.Migrator.run(repo, :up, all: true)
    
      if Application.get_env(:pleroma, :database)[:rum_enabled] do
        Ecto.Migrator.run(repo, "priv/optional_migrations/rum_enabled", :up, all: true)
      end
    end)

    Then you put that inside "pleroma.migrate" and then you will always run all of the appropriate migrations depending on the user configuration. The benefit of this approach is that you have full control but you may also end up re-implementing the existing ecto.migrate/ecto.rollback commands. Note also we are using Ecto.Migrator.with_repo/2, which requires Ecto.SQL 3.1.2+ (released this weekend).

    Another idea is to send a PR to Ecto.SQL so we support a --migration-path option in Ecto. It should be relatively straight-forward to do so, and we will be happy to provide any guidance. With such flag, then people interested in using those special features can run those commands themselves, but it means they will be the ones doing most of the work.

    The other thing you will want to consider is testing. If you start having a bunch of optional migrations, you need to make sure your suite passes on all possible combinations. Luckily this can be done by using multiple environment variables on CI and handling those accordingly.

    Thoughts?

  • The --migration-path idea sounds interesting. Would that take a directory or a migration file as argument?

  • A directory with all migrations, so it should likely be called --migrations-path. Good point!

  • I am also for doing --migrations-path, and maybe aliasing pleroma.migrate/pleroma.rollback to

    defp migrate(args) do
       Mix.Tasks.Ecto.Migrate.run(args)
    
      if Pleroma.Config.get([:database, :rum_enabled]),
        do: Mix.Tasks.Ecto.Migrate.run(["--migrations-path priv/optional_migrations/rum_enabled" | args])
    end
    
    defp rollback(args) do
      Mix.Tasks.Ecto.Rollback.run(args)
    
      if Pleroma.Config.get([:database, :rum_enabled]),
        do: Mix.Tasks.Ecto.Rollback.run(["--migrations-path priv/optional_migrations/rum_enabled" | args])
    end

    for convenience

  • So I went ahead and implementations --migrations-path here: https://github.com/elixir-ecto/ecto_sql/commit/e839a9a327b632d73533ac8105ba360bc831cf83

    Feel free to give it a try and, if it works, we can do a new release.

    Btw, you will have to split the option from the argument when calling run, like this:

    Mix.Tasks.Ecto.Migrate.run(["--migrations-path", "priv/optional_migrations/rum_enabled" | args])

    Other than that everything looks good!

  • Contributor

    That commit looks like it will work well for us.

  • lain added 1 commit

    added 1 commit

    • f1e67bdc - Search: Add optional rum indexing / searching.

    Compare with previous version

  • lain mentioned in merge request !1160 (merged)

    mentioned in merge request !1160 (merged)

  • lain added 157 commits

    added 157 commits

    • f1e67bdc...e5b34f5e - 156 commits from branch develop
    • 412a3d8a - Merge branch 'develop' of git.pleroma.social:pleroma/pleroma into rum-index

    Compare with previous version

  • lain added 1 commit

    added 1 commit

    • 6a8491d9 - CI: Add rum variant testing.

    Compare with previous version

  • lain added 1 commit

    added 1 commit

    • 3c0fe26f - CI: Add rum variant testing.

    Compare with previous version

  • lain added 1 commit

    added 1 commit

    • 89391a3f - CI: Add rum variant testing.

    Compare with previous version

  • lain added 1 commit

    added 1 commit

    • feb1edf3 - CI: Add rum variant testing.

    Compare with previous version

  • rinpatch
  • lain added 1 commit

    added 1 commit

    • b79618ef - CI: Add rum variant testing.

    Compare with previous version

  • rinpatch
  • lain added 1 commit

    added 1 commit

    • a3fc7294 - CI: Add rum variant testing.

    Compare with previous version

  • lain resolved all discussions

    resolved all discussions

  • lain added 3 commits

    added 3 commits

    • 022e6e4b - RUM: Remove vacuum analyze from migration
    • 8784a7d1 - RUM: Set rum status by the environment.
    • b986ec49 - CI: Use the correct image with the correct hostname.

    Compare with previous version

  • lain added 1 commit

    added 1 commit

    • ef63cf70 - CI: Use the correct image with the correct hostname.

    Compare with previous version

  • lain unmarked as a Work In Progress

    unmarked as a Work In Progress

  • I didn't wrap the migration in a way that it gets automatically run when the rum setting is set, mainly because I want people to pay attention when adding it and really know what they are doing.

  • Nice!

  • merged

  • rinpatch mentioned in commit d2dacadb

    mentioned in commit d2dacadb

  • Please register or sign in to reply
    Loading