Multi-language posting
It would be beneficial to allow users specify the language of a post, and to use contentMap and such in ActivityStreams to mark the choice of language(s). A post can have multiple versions in different languages. The plan is as follows:
- mastodon API
-
language
param when posting: Mastodon only allows it to be a string (enum). We should not enforce the enum requirement, meaning we should allow arbitrary strings in language codes. We could, however, parse the language code to decompose lang, extlang, writing system and region, in order to support language filtering per user or instance-default. - We can also allow it to be a list of strings, so users can post in multiple languages.
- when
language
is a list, requirestatus_map
,spoiler_text_map
andpoll["options_map"][*]
to be a map from language code to the text intended. Disregardstatus
,spoiler_text
andpoll["options"]
. - When returning a status, use
status_map
,spoiler_text_map
andpoll["options_map"]
to return the language-marked versions. Use the original attrs to return the text containing every language version (see below).
-
- AP representation
- contentMap contains
{"lang-code": "content"}
- content contains a string that is composed of every language version, merged into a template (the default can be
[{lang-code}] {text}
) via a separator (could be<br><br>---<br><br>
). Both the template and the separator must be configurable. If contentMap only contain one language, do not use the template. Instead, fill what is incontentMap[lang]
intocontent
. - Note: as
content
meanscontent
with@language=und
in our JSON-LD structure (specified by our header), we might want to disallow the valueund
in language. - We could have a different template and separator for single-line content (e.g. subject, image desc, polls)
- If an incoming object has contentMap, disregard their content, and instead compute our own version.
- Use some AP attribute to indicate the preferred order of languages that appears in the merged content (specified by mastodon api
language
attribute). This is optional.
- contentMap contains
- language filtering
- We should support inclusive and exclusive filtering.
- When one language code is filtered, all subvariants of it should also be. For example, if a user chooses to include
en
, we should also include posts marked withen-Latn
,en-CA
,en-Latn-US
, and so on. - Don't make assumptions on the subtags. Use it as what the user specified.
- Misc/Implementation details
- If we want to implementation a language dropdown in frontend, I did this in Glitch-lily, that directly pulls the language codes from the iana registry. https://lily-is.land/infra/glitch-lily/-/commit/3cea07639462f9e9c892ed2c9a24a853fd9a7515
- In the frontend we should not enforce an enum. Give the user the option to choose a language, and optionally the variant subtags (writing system, region, etc.).
- We can, however make a list of "commonly used languages" and put them on the top of the list. This should at least include all the languages we have a translation for.
ja_easy
can be represented asja-Hrkt
.
- We need more cache entries to store different language versions of the same post.
- If we want to implementation a language dropdown in frontend, I did this in Glitch-lily, that directly pulls the language codes from the iana registry. https://lily-is.land/infra/glitch-lily/-/commit/3cea07639462f9e9c892ed2c9a24a853fd9a7515
Edited by tusooa