WIP: (VERY WIP) big refactor of our ingestion pipeline
Closes #806 (closed)
About this MR
This is a reworking of our 'ingestion pipeline' (i.e. the way objects get into our system) that introduces some new concept.
Before, both CommonAPI and Transmogrifier had some overlapping responsibilities that were not properly abstracted. They would both do validations and checks for actor/object presence, but both did so in different ways that did not always lead to the same result.
This MR unifies both by making the way that objects are validated and ingested explicit.
Concepts
In general, there's the following pipeline that all incoming activities will go through now:
- Building / Modifying (Different between CommonAPI / Transmogrifier)
- Casting and validation
- MRF
- Persistence
- Side effect handling
- Federation
0. Building / Modifying
CommonAPI will build a map of the activity according to our policies, Transmogrifier will modify the incoming message to fit our internal format. Transmogrifier can use some of the casting / validation features to achieve this (that is, to get the data in to the basic shape), but it might still need to do some additional steps to make the activity validate completely. For example, many of our activities require a context
, so Transmogrifier will need to add that if it's missing.
1. Casting and validation
This is now done by utilizing non-persisted Ecto schemas with custom types which are used as validators. It makes it possible to properly validate incoming activities and ensure a certain format inside our database. With custom types, this can also handle things like casting a to: "someuser"
to to: ["someuser"]
seamlessly during casting.
2. MRF
Nothing much changed here, but MRF can be sure that it receives a validated object now.
3. Persistence
In contrast to the existing ActivityPub.insert
, this step really just involves persisting the Activity or Object in our database. There's no triggering of any side effects or checking of validity involved.
4. Side effect handling
This part executes all the side effects that are associated with the Activity. These are both the AP side effects (like actually creating an object for a Create
activity) and our own side effects (like pushing notifications out). This part does not necessarily have to be run before returning the pipeline and can be spun out into a delayed job.
5. Federating
This federates the persisted Activity if needed. This can be run in parallel with side effect handling in some cases.
Advantages
- Proper separation of concerns
- Partial unification of CommonAPI and Transmogrifier
- Better validations
- Much better errors, you can usually pinpoint exactly what is wrong with an incoming activity just by the error stack.
- (Later) Run side effects asynchronously
- (Later) Semaphores to prevent two activities acting on the same object