Rich media embeds (!3401) · Merge requests · Pleroma / pleroma

Alex Gleason requested to merge rich-media into develop May 05, 2021

This is an entirely new version of !3366 (closed). This is my third (or fourth?) iteration trying to solve the problem of rich media embeds, and after a lot of thought I decided I needed to majorly redesign the code. I investigated some third-party solutions, but for one reason or another they were not a good fit for Pleroma, so this time I created a solution from the ground-up.

The goal is to limit the number of HTTP requests. Services like YouTube enforce oppressive rate limiting and will block even small Pleroma servers. The solution is to make the OEmbed request directly to the service's OEmbed endpoint rather than scraping the page first to discover it.

I created an Elixir library, oembed_providers_elixir, which loads JSON provider data that Pleroma now uses to skip unnecessary HTTP requests. If it can't produce the embed that way, it'll fall back to scraping the page.

There are two new structs:

%Embed{} - inspired by Furlex, this struct holds any and all embed data about a URL. The Parser now returns this type. Individual parsers build an Embed rather than a messy, undefined map.

Example:

%Embed{
  url: "http://example.com/ogp",
  title: "The Rock (1996)",
  meta: %{
    "og:image" => "http://ia.media-imdb.com/images/rock.jpg",
    "og:title" => "The Rock",
    "og:description" => "Directed by Michael Bay. With Sean Connery, Nicolas Cage, Ed Harris, John Spencer.",
    "og:type" => "video.movie",
    "og:url" => "http://www.imdb.com/title/tt0117500/"
  },
  oembed: nil
}

%Card{} - represents a MastoAPI card, plus sanitation and validation. Embeds get turned into Cards, which ultimately get returned through the API.

Example:

%Card{
  type: "link",
  title: "She Was Arrested at 14. Then Her Photo Went to a Facial Recognition Database.",
  url: "https://www.nytimes.com/2019/08/01/nyregion/nypd-facial-recognition-children-teenagers.html",
  description: "With little oversight, the N.Y.P.D. has been using powerful surveillance technology on photos of children and teenagers.",
  image: "https://static01.nyt.com/images/2019/08/01/nyregion/01nypd-juveniles-promo/01nypd-juveniles-promo-videoSixteenByNineJumbo1600.jpg",
  provider_name: "www.nytimes.com",
  provider_url: "https://www.nytimes.com"
}

All of the code now centers around these two types, greatly simplifying it.

Some other design choices:

Parser classes no longer stop when they find empty data. All available Parsers run and build out different parts of the Embed struct. Their order doesn't matter anymore.
I ditched MetaTagsParser and replaced it with a completely new module, MetaTags. I think the old one was trying to do too much. The new one simply parses all <meta> tags on the page into a map.

Screenshots:

Rich media embeds

Merge request reports