beam.smp eating cpu and ram
Environment
- Installation type: Source
- Pleroma version: 2.4.3
- Elixir version: Erlang/OTP 23 [erts-11.1.8] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:1]; Elixir 1.10.3 (compiled with Erlang/OTP 22)
- Operating system: Debian 11.5
- PostgreSQL version: 13.8
Bug description
I'm on a vps with one dedicated core (AMD Ryzen 9 3900X) and 4GB of ram, pleroma-fe has become practically unusable, loading profiles, threads and other things cause the beam.smp process to consume up to 3GB of memory and 100% of cpu. Making a post (with or without attachments) or loading a large thread causes it to run out of memory and crash unless i use swap space.
What I've tried so far to troubleshoot is disabling elixir's busy waiting, changed postgresql's settings to not use as much ram, but i haven't seen any improvements.
It was suggested to me to check the oban queue, so here is the output from that:
pleroma=# select count((state,args->'params'->'type')), (state,args->'params'->'type') from oban_jobs where state not in ('completed') group by (state,args->'params'->'type') order by count((state,args->'params'->'type'));
count | row
-------+------------------------------
1 | (executing,"""Undo""")
1 | (executing,"""View""")
6 | (executing,"""EmojiReact""")
17 | (discarded,"""Delete""")
18 | (retryable,"""Delete""")
25 | (executing,"""Like""")
42 | (executing,"""Create""")
49 | (executing,"""Delete""")
82 | (available,)
104 | (executing,"""Announce""")
221 | (executing,)
(11 rows)
pleroma=# select worker, count(worker) from oban_jobs where state not in ('completed') group by worker order by count(worker);
worker | count
------------------------------------------+-------
Pleroma.Workers.WebPusherWorker | 3
Pleroma.Workers.PublisherWorker | 54
Pleroma.Workers.Cron.DigestEmailsWorker | 82
Pleroma.Workers.AttachmentsCleanupWorker | 164
Pleroma.Workers.ReceiverWorker | 259
(5 rows)
I moved pleroma to this server about 12 days ago, but these issues started only a week ago, I'm not sure if it's relevant but I was not able to restore the database with pg_restore
because it got stuck at create INDEX "public.activities_recipients_index"
letting the CPU run at 100% but nothing was really happening, what i did instead was just rsync the postgres data directory from the old server to the new one and that seemed to work but I'm not sure if that caused any side-effects that's related to this issue.