Skip to content

WIP: (VERY WIP) big refactor of our ingestion pipeline

lain requested to merge remake-remodel into develop

Closes #806 (closed)

About this MR

This is a reworking of our 'ingestion pipeline' (i.e. the way objects get into our system) that introduces some new concept.

Before, both CommonAPI and Transmogrifier had some overlapping responsibilities that were not properly abstracted. They would both do validations and checks for actor/object presence, but both did so in different ways that did not always lead to the same result.

This MR unifies both by making the way that objects are validated and ingested explicit.

Concepts

In general, there's the following pipeline that all incoming activities will go through now:

  1. Building / Modifying (Different between CommonAPI / Transmogrifier)
  2. Casting and validation
  3. MRF
  4. Persistence
  5. Side effect handling
  6. Federation

0. Building / Modifying

CommonAPI will build a map of the activity according to our policies, Transmogrifier will modify the incoming message to fit our internal format. Transmogrifier can use some of the casting / validation features to achieve this (that is, to get the data in to the basic shape), but it might still need to do some additional steps to make the activity validate completely. For example, many of our activities require a context, so Transmogrifier will need to add that if it's missing.

1. Casting and validation

This is now done by utilizing non-persisted Ecto schemas with custom types which are used as validators. It makes it possible to properly validate incoming activities and ensure a certain format inside our database. With custom types, this can also handle things like casting a to: "someuser" to to: ["someuser"] seamlessly during casting.

2. MRF

Nothing much changed here, but MRF can be sure that it receives a validated object now.

3. Persistence

In contrast to the existing ActivityPub.insert, this step really just involves persisting the Activity or Object in our database. There's no triggering of any side effects or checking of validity involved.

4. Side effect handling

This part executes all the side effects that are associated with the Activity. These are both the AP side effects (like actually creating an object for a Create activity) and our own side effects (like pushing notifications out). This part does not necessarily have to be run before returning the pipeline and can be spun out into a delayed job.

5. Federating

This federates the persisted Activity if needed. This can be run in parallel with side effect handling in some cases.

Advantages

  • Proper separation of concerns
  • Partial unification of CommonAPI and Transmogrifier
  • Better validations
  • Much better errors, you can usually pinpoint exactly what is wrong with an incoming activity just by the error stack.
  • (Later) Run side effects asynchronously
  • (Later) Semaphores to prevent two activities acting on the same object
Edited by Ivan Tashkinov

Merge request reports